Transpiling between any programming languages (Part 1)

| | Transpilers ANTLR Compass BSON teaching tools aggregation pipelines Compilers Engineering

The Compass team has written a lightweight, extendable transpiler that translates BSON to and from any language. This tool allows developers to work in one language while being able to export the queries and aggregations they build to other languages. Most compilers are one-to-one, or less commonly, one-to-many or many-to-one. There are hardly any many-to-many transpilers. To avoid having to start from scratch, we leveraged the open source parsing tool ANTLR which provided us with a set of compiler tools along with preexisting grammars for the languages we needed. We successfully minimized the amount of additional complexity by coming up with a creative set of class hierarchies that reduced the amount of work needed from n² to n.

Repeatable performance tests: CPU options are best disabled

| | performance testing EC2 CPU options performance profiling

In an effort to improve repeatability, the MongoDB Performance team set out to reduce noise on several performance test suites run on EC2 instances. After thinking about our assumptions and the experiment setup, we began by recording data about our current setup and found no evidence of particularly good or bad EC2 instances. In the next step, we investigated IO and found that EBS instances are the stable option for us. Having found a very stable behavior as far as disks were concerned, this third and final experiment turns to tuning CPU related knobs to minimize noise from this part of the system.

Repeatable performance tests: EBS instances are the stable option

| | performance profiling performance EBS IO

In an effort to improve repeatability, the MongoDB Performance team set out to reduce noise on several performance test suites run on EC2 instances. After thinking about our assumptions and the experiment setup, we began by recording data about our current setup and found no evidence of particularly good or bad EC2 instances. However, we found that the results of repeated tests had a high variance. Given our test data and our knowledge of the production system, we had observed that many of the noisiest tests did the most IO (being either sensitive to IO latency, or bandwidth). After performing the first baseline tests, we therefore decided to focus on IO performance through testing both AWS instance types and IO configuration on those instances.

Repeatable performance tests: EC2 instances are neither good nor bad

| | performance profiling performance EC2 testing

In an effort to improve repeatability, the MongoDB Performance team set out to reduce noise on several performance test suites run on EC2 instances. At the beginning of the project, it was unclear whether our goal of running repeatable performance tests in a public cloud was achievable. Instead of debating the issue based on assumptions and beliefs, we decided to measure noise itself and see if we could make configuration changes to minimize it.

After thinking about our assumptions and the experiment setup, we began by recording data about our current setup.

Reducing variability in performance tests on EC2: Setup and Key Results

| | performance profiling performance EC2 testing

On the MongoDB Performance team, we use EC2 to run daily system performance tests. After building a continuous integration system for performance testing, we realized that there were sources of random variation in our platform and system configuration which made a lot of our results non-reproducible. The run to run variation from the platform was bigger than the changes in MongoDB performance that we wanted to capture. To reduce such variation - environmental noise - from our test configuration, we set out on a project to measure and control for the EC2 environments on which we run our tests.

Causal guarantees are anything but casual

| | distributed systems

Traditional databases, because they service reads and writes from a single node, naturally provide sequential ordering guarantees for read and write operations known as "causal consistency". A distributed system can provide these guarantees, but in order to do so, it must coordinate and order related events across all of its nodes, and limit how fast certain operations can complete. While causal consistency is easiest to understand when all data ordering guarantees are preserved – mimicking a vertically scaled database, even when the system encounters failures like node crashes or network partitions – there exist many legitimate consistency and durability tradeoffs that all systems need to make.

MongoDB has been continuously running — and passing — Jepsen tests for years. Recently, we have been working with the Jepsen team to test for causal consistency. With their help, we learned how complex the failure modes become if you trade consistency guarantees for data throughput and recency.

Pruning Dynamic Rebuilds With libabigail

| | c++ open source optimization

Complex C++ projects frequently struggle with lengthy build times. Splitting a project into multiple dynamically-linked components can give developers faster incremental rebuilds and shorter edit-compile-test cycles than relying on static linking, especially when there are a large number of test binaries. However, build systems usually do not realize all of the possible gains in dynamic incremental rebuilds due to how they handle transitive library dependencies. Red Hat's ABI introspection library libabigail offers one possible path to eliminating unnecessary transitive re-linking for some classes of source modifications.

Considering the Community Effects of Introducing an Official MongoDB Go Driver

| | golang drivers open source

What do you do when an open-source project you rely on no longer meets your needs?  When your choice affects not just you, but a larger community, what principles guide your decision?

Submitting patches is often the first option, but you're at the mercy of the maintainer to accept them.  If the changes you need are sweeping, substantial alterations, the odds of acceptance are low.  Eventually, only a few realistic options remain: find an alternative, fork the project, or write your own replacement.  Everyone who depends on open source faces this conundrum at one time or another.

After relying for years on the community-developed mgo Go driver for MongoDB, MongoDB has begun work on a brand-new, internally-developed, open-source Go driver.  We know that releasing a company-sponsored alternative to a successful, community-developed project creates tension and uncertainty for users, so we did not make this decision lightly. We carefully considered how our choice would affect current and future Go users of MongoDB.

Investing in CS4All: One Year Later

| | cs4all education

When a couple of New York City high school teachers partnered with MongoDB to teach computer science, did they succeed? Their curriculum was untested, and they were teaching in difficult districts where most students are from poor and minority families. I talked with these two teachers, Jeremy Mellema and Timothy Chen, back in September, when they had completed a summer fellowship at MongoDB and had just started teaching their curriculum; at the end of the academic year this spring, I visited Jeremy and Tim again to find out the result.

Their successes were sparse and partial. They discovered that their students' poor reading skills were a barrier to learning to code, and that teaching new coders how to solve problems is, itself, an unsolved problem. With a coarse unit of iteration—a school semester—it is painfully slow to experiment and find teaching methods that work. But even partial wins make a difference for individual kids, and the support of professional engineers at companies like MongoDB can be a powerful accelerant.