The following is an email I sent to the MongoDB engineering team this past Friday. At the behest of the team, I am now making it public. —Eliot
You’ve likely heard of the 10-page memo that was published by a (now former) Google employee regarding Google’s diversity efforts. Parts of the memo assert that the gender gap in tech is rooted in biological differences. I want to make it clear where MongoDB stands on this issue, where I stand on this issue, and what behavior we expect from employees in regard to this.
If everything was a level playing field, and had been for millennia, this whole conversation would be non-existent. The fact is, the historical legacy of sexism, racism, and a host of other isms is real and powerful. What really matters is that trying to undo that damage, while a necessarily imperfect endeavor, is nonetheless of vital importance, and that diversity efforts are an effective method for achieving that goal. If we all agree that working towards a level playing field is the goal, conversations about how best to achieve it can be responsibly had.
This manifesto, however, is not part of a healthy dialogue at all. It advances a false equivalence between diversity efforts and discrimination built on a substrate of reasonable statements and context-free references to research. It is just another attempt to disguise prejudice in the clothing of rationalism. History is littered with them, and we can only hope that our work will hasten their consignment to the dustbin of shameful ideas. For those of you with the luxury of reading this latest example without feeling directly threatened by it, understand that crediting the author with “some good points” provides cover for his conclusions, and contributes to a hostile environment for your peers.
And for the record, there are many reasons I care a lot about diversity, beyond the obvious and paramount moral ones. I am 100% convinced that a more diverse engineering team at MongoDB will make our products better. Diversity isn’t just a metric, it’s a means to a more inclusive way of thinking, and teams with more diverse opinions and thought processes can better understand how others will interact with things they build. Such teams can’t help but build better, more usable, more understandable products, thereby better fulfilling our core company mission of making developers productive.
As our Embrace the Power of Differences value states, our commitment to increasing diversity is not about changing our standards (which is what the memo implies). It’s instead about a commitment to source, interview, grow, and retain members of underrepresented groups who meet those standards. If you have any questions or concerns about this, please feel free to reach out to me.
WinKerberos is a Python module providing Kerberos facilities to Python applications on Windows, where PyKerberos does not work. WinKerberos can be used as a drop-in replacement on Windows for client applications using PyKerberos.
Why write a new Kerberos module for Python?
The MongoDB Enterprise Edition 2.4 supported a new authentication mechanism: Kerberos V5 using the Generic Security Services API (GSSAPI). PyMongo, the Python driver for MongoDB, needed to support this new authentication mechanism.
There wasn't a lot of information available on using Kerberos in any Python application, to say nothing of one running on Windows. Nick Coghlan's 2011 article "Using the Python Kerberos Module" was the best information available at the time. Nick's article was about using PyKerberos to implement HTTP "Negotiate" authentication, whereas MongoDB uses a custom TCP wire protocol, but using the article's code examples, a few other sources, and a careful reading of section 3.1 of RFC 4752, I got Kerberos authentication in PyMongo working everywhere but Windows in a few days.
PyKerberos was written by Apple as part of their open source Calendar and Contacts Server project. It is a pure C Python extension module that builds against MIT Kerberos V5 or Heimdal and is well tested on macOS and Linux. Though there were rumors of people getting PyKerberos to work on Windows, we could never figure out how they did it.
When we released MongoDB Enterprise 2.4 on March 19, 2013, the number of users that needed support for PyMongo on Windows with Kerberos authentication appeared to be exactly zero. On May 22, 2013, we released PyMongo 2.5 with support for Kerberos authentication on the platforms PyKerberos supported. By 2016, after multiple requests for Kerberos support on Windows and an aborted attempt to implement support using kerberos-sspi, we decided to write a new Python module. This module would support Kerberos authentication on Windows for PyMongo and any other Python project that needed it.
To add Windows support to an existing application that uses PyKerberos for client authentication, change:
import winkerberos as kerberos
If you need to implement Kerberos authentication from scratch in your application, the README provides an example implementation to use with WinKerberos or PyKerberos.
Help us improve WinKerberos
WinKerberos has implemented all of the features of PyKerberos that PyMongo needed since version 0.1. Since then, we have shipped six more releases adding support for features requested by the community, and patches from users adding support for SPNEGO and RFC 5929 Channel Bindings. As a reimplementation of PyKerberos for Windows, WinKerberos is still incomplete. It lacks some of PyKerberos' client-side functions, like changePassword and getServerPrincipalDetails, and doesn't implement any of the server API. If you would like to see these features in WinKerberos, or you are adding a new feature to PyKerberos that should also exist in WinKerberos, we happily accept patches from the community. If you find a bug in WinKerberos, or want to request a new feature, please file a ticket in the Github project.
Part one of this pair explored the original algorithm the WiredTiger write-ahead log used to consolidate writes in order to minimize IO. It used atomic compare-and-swap operations in two phases to accomplish this without time-consuming locking. This algorithm worked extremely well as long as there were no more than a few threads running per core. But its reliance on busy-waiting to avoid locking caused a logjam when the number of threads increased beyond that limit -- a serious problem given that many MongoDB workloads would have a large number of threads per core. This issue was blocking MongoDB’s goal of making WiredTiger the default storage engine in v3.2.
This story has a happy ending thanks to my colleague, Senior Technical Service Engineer Bruce Lucas. Bruce had initially uncovered the logjam and reported it to me; together, we overcame it without compromising any other workloads. Because Bruce’s mindset was not colored by the legacy of the original approach, he was able to provide the critical insight that paved the way for the solution, allowing WiredTiger to become the default storage engine in v3.2.
Code can't be optimized; it can only be optimized for a set of conditions. When conditions change, optimizations can become bottlenecks, and when that happens, a thorough audit of assumptions might well hold the key to the solution.
The WiredTiger write-ahead log exemplifies this principle. It’s a critical codepath within a high-performance storage engine, and I have optimized it heavily to avoid I/O and locking. But some of the conditions I had initially targeted became invalid when WiredTiger became a storage engine in MongoDB. When a colleague of mine investigated a case of negative scaling found during testing, he uncovered a serious bottleneck in the write-ahead log… call it a "logjam". That investigation ultimately led us to rethink our assumptions and optimize for new conditions. We validated the new approach with a quick prototype, and then covered all the intricacies and edge cases to produce a fully realized solution.
In part one of this two-part series, I’ll dive deep into the innards of the WiredTiger write-ahead log. I’ll show how it orchestrates many threads writing to a single buffer without locking, and I’ll explain how two conflicts between that design and the new conditions produced the logjam. Part two will focus on how we eliminated the bottleneck. I’ll analyze its root causes, describe the key insight that enabled our solution, and detail the new algorithm and how it reflects our current conditions.
On January 10, I released a badly broken version of the MongoDB C Driver, libmongoc 1.5.2. For most users, that version could not connect to a server at all! Luckily, in under 24 hours a developer reported the bug, I reverted the mistake and released a fix. Although it was resolved before it did any damage, this is among the most dramatic mistakes I've made since I switched from the PyMongo team to libmongoc almost two years ago. My error stemmed from three mistaken assumptions I've had ever since I changed projects. What were they?
Here's how the story began. In December, a libmongoc user named Alexey pointed out a longstanding limitation: it would only resolve hostnames to IPv4 addresses. Even if IPv6 address records existed for a hostname, the driver would not look them up -- when it called getaddrinfo on the hostname to do the DNS resolution, it passed AF_INET as the address family, precluding anything but IPv4. So if you passed the URI mongodb://example.com, libmongoc resolved "example.com" to an IPv4 address like 220.127.116.11 and tried to connect to it. If the connection timed out, the driver gave up.
What do you do with a third-party tool that proves your application lacks a feature? Add that tool to your continuous integration system (after adding the feature, of course)! In our case we have added linearizable reads to MongoDB 3.4 and use Jepsen to test it.
What is Linearizability?
Linearizability is a property of distributed systems first introduced by Herlihy & Wing in their July 1990 article "Linearizability: a correctness condition for concurrent objects" (ACM Transactions on Programming Languages and Systems Journal). Peter Bailis probably provides the most accessible explanation of linearizability: "writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write."
Recently we published a piece by A. Jesse Jiryu Davis about his undertaking to prove that getaddrinfo was thread-safe on OS X, thus enabling Python to do away with an unnecessary and troublesome mutex around hostname resolution. The convolutions of tracking down that evidence, and the shroud of secrecy involved in all correspondence with Apple, inspired us to render the piece in a whimsical, high-fantasy style. It was called "The Saga of Concurrent DNS in Python, and the Defeat of the Wicked Mutex Troll"
It seems the unconventional style has inspired a couple of dramatic readings, which we’re just thrilled about. We’d love to share them with you.
That challenge was taken up by Mason Egger, who created his YouTube channel BSD Synergy to serve those not yet ready for the firehose of insider expertise that is BSD Now. He was so tickled with what he heard on BSD Now that he read the entire thing for BSD Synergy episode 20, against a backdrop of images that set the scene beautifully.
Allan, Mason, thank you both for bringing our work to life!
We cannot confirm any rumors that we are in talks with Peter Jackson to do a motion picture adaptation of this story.
A suggestion we received both in this discussion and on lobste.rs was to use canvas to render the data points. We had originally avoided canvas because of time constraints, lack of team familiarity with canvas, and the complications it introduced with regards to mouse interactions. However, Noah proposed a combination of SVG and canvas that strikes a balance between canvas' performance and SVG's convenience, complete with a demo. It piqued my interest, and so I decided to explore it in some more detail here.
Tell us about the time you made DNS resolution concurrent in Python on Mac and BSD.
No, no, you do not want to hear that story, my friends. It is nothing but old lore and #ifdefs.
But you made Python more scalable. The saga of Steve Jobs was sung to you by a mysterious wizard with a fanciful nickname! Tell us!
Gather round, then. I will tell you how I unearthed a lost secret, unbound Python from old shackles, and banished an ancient and horrible Mutex Troll.
Let us begin at the beginning...
A long time ago, in the 1980s, a coven of Berkeley sorcerers crafted an operating system. They named it after themselves: the Berkeley Software Distribution, or BSD. For generations they nurtured it, growing it and adding features. One night, they conjured a powerful function that could resolve hostnames to IPv4 or IPv6 addresses. It was called getaddrinfo. The function was mighty, but in years to come it would grow dangerous, for the sorcerers had not made getaddrinfo thread-safe.
As ages passed, BSD spawned many offspring. There were FreeBSD, OpenBSD, NetBSD, and in time, Mac OS X. Each made its copy of getaddrinfo thread safe, at different times and different ways. Some operating systems retained scribes who recorded these events in the annals. Some did not.
Because getaddrinfo is ringed round with mystery, the artisans who make cross-platform network libraries have mistrusted it. Is it thread safe or not? Often, they hired a Mutex Troll to stand guard and prevent more than one thread from using getaddrinfo concurrently. The most widespread such library is Python's own socket module, distributed with Python's standard library. On Mac and other BSDs, the Python interpreter hires a Mutex Troll, who demands that each Python thread hold a special lock while calling getaddrinfo.
Fuzz testing is a method for subjecting a codebase to a tide of hostile input to supplement the test cases engineers create on their own. In part one of this pair, we looked at the hybrid nature of our fuzzer -- how it combines “smart” and “dumb” fuzzing to produce input random enough to provoke bugs, but structured enough to pass input validation and test meaningful codepaths. To wrap up, I’ll discuss how we isolate signal from the noise a fuzzer intrinsically produces, and the tooling that augments the root cause analyses we do when the fuzzer finds an issue.