Considering the Community Effects of Introducing an Official MongoDB Go Driver

| | golang drivers open source

What do you do when an open-source project you rely on no longer meets your needs?  When your choice affects not just you, but a larger community, what principles guide your decision?

Submitting patches is often the first option, but you're at the mercy of the maintainer to accept them.  If the changes you need are sweeping, substantial alterations, the odds of acceptance are low.  Eventually, only a few realistic options remain: find an alternative, fork the project, or write your own replacement.  Everyone who depends on open source faces this conundrum at one time or another.

After relying for years on the community-developed mgo Go driver for MongoDB, MongoDB has begun work on a brand-new, internally-developed, open-source Go driver.  We know that releasing a company-sponsored alternative to a successful, community-developed project creates tension and uncertainty for users, so we did not make this decision lightly. We carefully considered how our choice would affect current and future Go users of MongoDB.

First, some history: Gustavo Niemeyer first announced the mgo community driver in March, 2011 – around the same time that MongoDB released version 1.8.0 of the database.  It currently has over 1,800 stars on GitHub and 32 contributors – including several current and former MongoDB employees. The incredible success of MongoDB in the Go community owes a great deal to Gustavo and mgo.

MongoDB itself is part of this community.  As the Go language matured and gained in popularity, MongoDB found many uses for it internally.  Some of the projects using it include:

  • Our remote agents for automated deployment, for backup, and for monitoring.
  • Our command-line operations tools, like mongodump.  (Re-written in Go for the 3.0 server release).
  • Our home-grown continuous integration system, Evergreen.
  • Our cloud products, like MongoDB Atlas and Stitch have major components written in Go.

From this experience, our engineers contributed back to mgo: over half a dozen employees have commits in mgo, accounting for over 2000 lines of changes.

But the more we used mgo, the more we discovered limitations.

With our in-house drivers – covering popular languages with deep commercial adoption – we often start driver feature development in parallel with server feature development so that we can test them as soon as the server merges a feature. But as a community project, mgo's feature support generally lags MongoDB server development. More critically, our products that use mgo can't easily test against or take advantage of new server features. Even if we thought that Go didn't yet have critical mass in our user base to justify an in-house driver, our own company's products can't wait for new features.

Sometimes, we patched a private copy of mgo to implement new features we critically needed.  This isn't always easy.  In 2015, we announced our next generation drivers, built upon a published set of specifications for driver behavior.  Because mgo predates this work, its conventions and internals don't match our specifications.  When the server implements new features and the driver development team writes specs to match, these new specs assume implementation of prior specs.  Developing comparable features in mgo can mean starting from a completely different base.

Not only does mgo have different internal conventions and behaviors than our in-house drivers, it encapsulates these behaviors in ways we found constraining. Usually, encapsulation is a good thing – a sign of good design – but many of our products benefit from low-level access to sockets, wire protocol models and encoding.  End-users don't need this access, but we have the knowledge to work with our own communication protocols and message formats safely and to great effect.

We wanted to invite people who wanted something more to try something new, rather than – via forking – implicitly asking people to pick sides in a project they already use.

For example, our mongoreplay tool lets users replay a tcpdump of MongoDB server requests against a different server or cluster.  When replaying the workload, we need server connection and authentication features – part of mgo's public API – but to replicate per-connection traffic we also need direct control over the number of socket connections and the socket message traffic, all of which is private.  To enqueue requests and to read responses we need access to the types representing the wire protocol messages – also private types that are never visible to end users.

Over time, we found ourselves copying-and-pasting parts of mgo source into project-specific libraries, or re-implementing parts of the wire protocol or driver behaviors directly.

There is a real cost in the time it takes engineers to patch mgo or to write, fix and extend a plethora of internal libraries, plus opportunity costs of having our own products not being able to use our own server's latest features.  We decided to consolidate and standardize on one implementation to address all these needs.  We considered two alternatives:

  • Fork mgo completely – developing at our pace, modifying internals as needed, and extending the APIs to suit our needs.
  • Develop a new driver – building from the ground up to our specifications, putting it on par with our other officially-maintained drivers.

Forking mgo would have a handful of benefits but many challenges.  In the benefits column, forking would minimize the impact on our existing products that use mgo as well as for any user who chose to use our fork over the original.  In the challenges column, we identified both technical and social considerations that gave us pause.

On the technical side, a fork wouldn't solve the large gap to our common specifications, making new feature development much harder than for our internally-developed drivers.  It also raises a tough question: what if we implement a new feature in our fork only to find that mgo implements it a different way?  The more we might take the internal architecture and the API in a different direction from mgo, the harder it would be keep our fork a "drop-in" replacement and the harder it would be to send patches upstream or to merge in upstream development.  We felt a fork would quickly become an independent, backwards-incompatible product, despite a common lineage – undercutting the alleged benefit of forking.

On the social side, we knew that anything we released – whether a fork or a new driver –  could have a disruptive effect on the existing mgo community. We didn't want to discourage anyone happy using mgo with MongoDB from continuing to use it. We wanted to invite people who wanted something more to try something new, rather than – via forking – implicitly asking people to pick sides in a project they already use.  Forking could also imply that we would take on mgo's technical debt, which we wanted to avoid.

In light of these challenges, we decided instead to write a new, independently-developed Go driver to join the eleven other drivers in our officially-maintained driver ecosystem.  A fresh start allows us to focus our efforts on four main benefits:

  • Velocity: once complete, the new Go driver will evolve as fast as the server does.  We'll be able to dog-food new features internally before each server GA release.
  • Consistency: the new Go driver will follow our common specifications from the outset, so the new driver API will feel like other MongoDB drivers, shortening the learning curve for users.  We'll also be staying idiomatic to Go, such as supporting context objects for cancellable requests.
  • Performance: a new driver gives an opportunity to provide a new, higher-performance BSON library and design the driver API in a way that gives users more control over memory allocations.
  • Low-level API: for our own in-house products and other power users, we will provide low-level components for reuse, reducing code duplication across the company.  Unlike the rest of the driver, this API will have no stability guarantee and no end-user support, but it will let us develop better products faster and our users will benefit that way.

Fortunately, we were able to start from a prototype driver custom developed for our BI Connector – written by a former driver engineer – and build from that base towards the common driver specification.  We're now finalizing the details of the new BSON library and the core CRUD API.

What's next for the driver?  In the coming months, we'll ship an "alpha" release of the Go driver and make the code repository public. At that point we’ll ask members of the Go-using MongoDB community to try it out and help us improve it with their feedback.

Update, 2/19/2018: The new driver is now in alpha, please read the announcement for more info about trying it out.

If you're interested in being notified when the alpha is available, or if you have thoughts on this article you'd like to share, please email me at david@mongodb.com or subscribe to the mongodb-announce google group. I look forward to hearing from you.