Causal guarantees are anything but casual

| | distributed systems

Traditional databases, because they service reads and writes from a single node, naturally provide sequential ordering guarantees for read and write operations known as "causal consistency". A distributed system can provide these guarantees, but in order to do so, it must coordinate and order related events across all of its nodes, and limit how fast certain operations can complete. While causal consistency is easiest to understand when all data ordering guarantees are preserved – mimicking a vertically scaled database, even when the system encounters failures like node crashes or network partitions – there exist many legitimate consistency and durability tradeoffs that all systems need to make.

MongoDB has been continuously running — and passing — Jepsen tests for years. Recently, we have been working with the Jepsen team to test for causal consistency. With their help, we learned how complex the failure modes become if you trade consistency guarantees for data throughput and recency.

Pruning Dynamic Rebuilds With libabigail

| | c++ open source optimization

Complex C++ projects frequently struggle with lengthy build times. Splitting a project into multiple dynamically-linked components can give developers faster incremental rebuilds and shorter edit-compile-test cycles than relying on static linking, especially when there are a large number of test binaries. However, build systems usually do not realize all of the possible gains in dynamic incremental rebuilds due to how they handle transitive library dependencies. Red Hat's ABI introspection library libabigail offers one possible path to eliminating unnecessary transitive re-linking for some classes of source modifications.

Considering the Community Effects of Introducing an Official MongoDB Go Driver

| | golang drivers open source

What do you do when an open-source project you rely on no longer meets your needs?  When your choice affects not just you, but a larger community, what principles guide your decision?

Submitting patches is often the first option, but you're at the mercy of the maintainer to accept them.  If the changes you need are sweeping, substantial alterations, the odds of acceptance are low.  Eventually, only a few realistic options remain: find an alternative, fork the project, or write your own replacement.  Everyone who depends on open source faces this conundrum at one time or another.

After relying for years on the community-developed mgo Go driver for MongoDB, MongoDB has begun work on a brand-new, internally-developed, open-source Go driver.  We know that releasing a company-sponsored alternative to a successful, community-developed project creates tension and uncertainty for users, so we did not make this decision lightly. We carefully considered how our choice would affect current and future Go users of MongoDB.

Investing in CS4All: One Year Later

| | cs4all education

When a couple of New York City high school teachers partnered with MongoDB to teach computer science, did they succeed? Their curriculum was untested, and they were teaching in difficult districts where most students are from poor and minority families. I talked with these two teachers, Jeremy Mellema and Timothy Chen, back in September, when they had completed a summer fellowship at MongoDB and had just started teaching their curriculum; at the end of the academic year this spring, I visited Jeremy and Tim again to find out the result.

Their successes were sparse and partial. They discovered that their students' poor reading skills were a barrier to learning to code, and that teaching new coders how to solve problems is, itself, an unsolved problem. With a coarse unit of iteration—a school semester—it is painfully slow to experiment and find teaching methods that work. But even partial wins make a difference for individual kids, and the support of professional engineers at companies like MongoDB can be a powerful accelerant.

Farewell, Solaris

| |

[Update 8/29/17 3:45pm: edited to contain more specifics regarding Solaris downloads and security patches.]

Solaris was the first “real operating system” I ever used.

The Brown University Computer Science Department was a Sun Microsystems shop when I was an undergraduate there in the late 90s. When I took the operating systems lab class, CS-169, we implemented a toy version of Sun’s research operating system Spring OS. Several of my contemporaries in the CS Department went on to work at Sun, and developed or advanced many of the technologies that made Solaris great, like ZFS, dtrace, libumem, mdb, doors, and zones. The Solaris Linkers and Libraries Guide remains one of the best ways to develop an understanding of shared library internals. The first startup I worked for developed on Solaris x86, because the team knew Solaris well. Today, many of my co-workers on the server engineering team here at MongoDB share that formative experience with Solaris. We have a great deal of collective nostalgia and appreciation for Solaris and the amazing engineering effort that went into its development.

So it is, for many of us at MongoDB, bittersweet to announce that MongoDB is terminating support for Solaris.

Effective immediately, we plan to cease production of new builds of MongoDB for Solaris, across all supported versions of MongoDB. Existing release artifacts for Solaris will continue to be made available, but no new releases will be issued, barring a critical issue raised under an existing support contract covering MongoDB versions 3.0 through 3.4 running on Solaris. We will continue to fix critical flaws for the community, regardless of where found or how reported. Anyone can report a security vulnerability by using our Security project to create an account, then a ticket, describing the vulnerability.

This was not an easy decision for us to make, and we feel that it is important to provide some background on why we have made what may seem at first to be a capricious decision.

A MongoDB Engineering Response to the Anti-Diversity-Effort Manifesto

| | diversity

The following is an email I sent to the MongoDB engineering team this past Friday. At the behest of the team, I am now making it public. —Eliot

You’ve likely heard of the 10-page memo that was published by a (now former) Google employee regarding Google’s diversity efforts. Parts of the memo assert that the gender gap in tech is rooted in biological differences. I want to make it clear where MongoDB stands on this issue, where I stand on this issue, and what behavior we expect from employees in regard to this.

If everything was a level playing field, and had been for millennia, this whole conversation would be non-existent. The fact is, the historical legacy of sexism, racism, and a host of other isms is real and powerful. What really matters is that trying to undo that damage, while a necessarily imperfect endeavor, is nonetheless of vital importance, and that diversity efforts are an effective method for achieving that goal. If we all agree that working towards a level playing field is the goal, conversations about how best to achieve it can be responsibly had.

This manifesto, however, is not part of a healthy dialogue at all. It advances a false equivalence between diversity efforts and discrimination built on a substrate of reasonable statements and context-free references to research. It is just another attempt to disguise prejudice in the clothing of rationalism. History is littered with them, and we can only hope that our work will hasten their consignment to the dustbin of shameful ideas. For those of you with the luxury of reading this latest example without feeling directly threatened by it, understand that crediting the author with “some good points” provides cover for his conclusions, and contributes to a hostile environment for your peers.

And for the record, there are many reasons I care a lot about diversity, beyond the obvious and paramount moral ones. I am 100% convinced that a more diverse engineering team at MongoDB will make our products better. Diversity isn’t just a metric, it’s a means to a more inclusive way of thinking, and teams with more diverse opinions and thought processes can better understand how others will interact with things they build. Such teams can’t help but build better, more usable, more understandable products, thereby better fulfilling our core company mission of making developers productive.

As our Embrace the Power of Differences value states, our commitment to increasing diversity is not about changing our standards (which is what the memo implies). It’s instead about a commitment to source, interview, grow, and retain members of underrepresented groups who meet those standards. If you have any questions or concerns about this, please feel free to reach out to me.

Presenting WinKerberos

| | open source python kerberos

WinKerberos is a Python module providing Kerberos facilities to Python applications on Windows, where PyKerberos does not work. WinKerberos can be used as a drop-in replacement on Windows for client applications using PyKerberos.

Why write a new Kerberos module for Python?

The MongoDB Enterprise Edition 2.4 supported a new authentication mechanism: Kerberos V5 using the Generic Security Services API (GSSAPI). PyMongo, the Python driver for MongoDB, needed to support this new authentication mechanism.

There wasn't a lot of information available on using Kerberos in any Python application, to say nothing of one running on Windows. Nick Coghlan's 2011 article "Using the Python Kerberos Module" was the best information available at the time. Nick's article was about using PyKerberos to implement HTTP "Negotiate" authentication, whereas MongoDB uses a custom TCP wire protocol, but using the article's code examples, a few other sources, and a careful reading of section 3.1 of RFC 4752, I got Kerberos authentication in PyMongo working everywhere but Windows in a few days.

PyKerberos was written by Apple as part of their open source Calendar and Contacts Server project. It is a pure C Python extension module that builds against MIT Kerberos V5 or Heimdal and is well tested on macOS and Linux. Though there were rumors of people getting PyKerberos to work on Windows, we could never figure out how they did it.

When we released MongoDB Enterprise 2.4 on March 19, 2013, the number of users that needed support for PyMongo on Windows with Kerberos authentication appeared to be exactly zero. On May 22, 2013, we released PyMongo 2.5 with support for Kerberos authentication on the platforms PyKerberos supported. By 2016, after multiple requests for Kerberos support on Windows and an aborted attempt to implement support using kerberos-sspi, we decided to write a new Python module. This module would support Kerberos authentication on Windows for PyMongo and any other Python project that needed it.

Enter WinKerberos

WinKerberos is a pure C Python extension module that supports Python 2.6, 2.7, and 3.3+. It provides most of the client API of PyKerberos, but using Microsoft's Security Support Provider Interface (SSPI) under the covers. PyMongo, Requests, and a few other projects use WinKerberos as the Kerberos provider on Windows. It is available on pypi as prebuilt binary wheels and can be installed with pip without a C compiler:

 python -m pip install winkerberos

To add Windows support to an existing application that uses PyKerberos for client authentication, change:

import kerberos


    import winkerberos as kerberos
except ImportError:
    import kerberos

If you need to implement Kerberos authentication from scratch in your application, the README provides an example implementation to use with WinKerberos or PyKerberos.

Help us improve WinKerberos

WinKerberos has implemented all of the features of PyKerberos that PyMongo needed since version 0.1. Since then, we have shipped six more releases adding support for features requested by the community, and patches from users adding support for SPNEGO and RFC 5929 Channel Bindings. As a reimplementation of PyKerberos for Windows, WinKerberos is still incomplete. It lacks some of PyKerberos' client-side functions, like changePassword and getServerPrincipalDetails, and doesn't implement any of the server API. If you would like to see these features in WinKerberos, or you are adding a new feature to PyKerberos that should also exist in WinKerberos, we happily accept patches from the community. If you find a bug in WinKerberos, or want to request a new feature, please file a ticket in the Github project.

Breaking the WiredTiger Logjam: The Wait-Free Solution (2/2)

| | optimization c concurrency

Part one of this pair explored the original algorithm the WiredTiger write-ahead log used to consolidate writes in order to minimize IO. It used atomic compare-and-swap operations in two phases to accomplish this without time-consuming locking. This algorithm worked extremely well as long as there were no more than a few threads running per core. But its reliance on busy-waiting to avoid locking caused a logjam when the number of threads increased beyond that limit -- a serious problem given that many MongoDB workloads would have a large number of threads per core. This issue was blocking MongoDB’s goal of making WiredTiger the default storage engine in v3.2.

This story has a happy ending thanks to my colleague, Senior Technical Service Engineer Bruce Lucas. Bruce had initially uncovered the logjam and reported it to me; together, we overcame it without compromising any other workloads. Because Bruce’s mindset was not colored by the legacy of the original approach, he was able to provide the critical insight that paved the way for the solution, allowing WiredTiger to become the default storage engine in v3.2.

Breaking the WiredTiger Logjam: The Write-Ahead Log (1/2)

| | optimization c concurrency

Code can't be optimized; it can only be optimized for a set of conditions. When conditions change, optimizations can become bottlenecks, and when that happens, a thorough audit of assumptions might well hold the key to the solution.

The WiredTiger write-ahead log exemplifies this principle. It’s a critical codepath within a high-performance storage engine, and I have optimized it heavily to avoid I/O and locking. But some of the conditions I had initially targeted became invalid when WiredTiger became a storage engine in MongoDB. When a colleague of mine investigated a case of negative scaling found during testing, he uncovered a serious bottleneck in the write-ahead log… call it a "logjam". That investigation ultimately led us to rethink our assumptions and optimize for new conditions. We validated the new approach with a quick prototype, and then covered all the intricacies and edge cases to produce a fully realized solution.

In part one of this two-part series, I’ll dive deep into the innards of the WiredTiger write-ahead log. I’ll show how it orchestrates many threads writing to a single buffer without locking, and I’ll explain how two conflicts between that design and the new conditions produced the logjam. Part two will focus on how we eliminated the bottleneck. I’ll analyze its root causes, describe the key insight that enabled our solution, and detail the new algorithm and how it reflects our current conditions.