What do you do when an open-source project you rely on no longer meets your needs? When your choice affects not just you, but a larger community, what principles guide your decision?
Submitting patches is often the first option, but you're at the mercy of the maintainer to accept them. If the changes you need are sweeping, substantial alterations, the odds of acceptance are low. Eventually, only a few realistic options remain: find an alternative, fork the project, or write your own replacement. Everyone who depends on open source faces this conundrum at one time or another.
After relying for years on the community-developed mgo Go driver for MongoDB, MongoDB has begun work on a brand-new, internally-developed, open-source Go driver. We know that releasing a company-sponsored alternative to a successful, community-developed project creates tension and uncertainty for users, so we did not make this decision lightly. We carefully considered how our choice would affect current and future Go users of MongoDB.
We’ve all been there: you’re pitching a solution when one of your team members interjects, “let’s not reinvent the wheel, here.” Whether it’s based on fear or wisdom, the charge of reinventing the wheel is a death sentence for ideas. It typically isn’t worth the time and resources to implement a new version of an old, ubiquitous idea—though you’d never know that with all the different kinds of actual, literal wheels you use every day.
For most developers, continuous integration (CI)—the automated building and testing of new code pushed into your repository—is one of those never-reinvented wheels. You set up one of a few long-standing solutions like Travis or Jenkins, rejigger your test code to fit into that solution’s organizational model, and then avoid messing with it too much. Here at MongoDB, challenging this approach rewarded us incredibly.
Instead of working around an off-the-shelf solution that didn’t fit our needs, we wound up reinventing the wheel and built our own continuous integration system called Evergreen. It gives us a powerful, efficient infrastructure that lets us test changes quickly -- and keeps our engineers happy as well. Our journey to creating Evergreen was born of necessity and stalked by uncertainty, but we don’t regret it. Reinventing the wheel allowed us to build a near-perfect CI tool for our use case, seriously evaluate powerful new technologies, and have a lot of fun doing it.
The Go language is great for concurrency, but when you have to do work that is naturally serial, must you forgo those benefits? We faced this question while rewriting our database backup utility, mongodump, and utilized a “divide-and-multiplex” method to marry a high-throughput concurrent workload with a serial output.
The Need for Concurrency
In MongoDB, data is organized into collections of documents. When reading from a collection, requests are often preempted, when other processes obtain a write lock in that collection. To prevent stalls from reducing overall throughput, you can enqueue reads from multiple collections at once. Thus, a previous version of mongodump concurrently read data across collections, to achieve maximum throughput.
However, since the old mongodump wrote each collection to a separate file, it did not work for two very common use cases for database backup utilities: 1) streaming the backup over a network, and 2) streaming the backup directly into another instance as part of a load operation. Our new version was designed to support these use cases.
To do that, while preserving the throughput-maximizing properties of concurrent reads, we leveraged some Golang constructs -- including reflection and channels -- to safely permit multiple goroutines to concurrently feed data into the archive. Let me show you how.