In an effort to improve repeatability, the MongoDB Performance team set out to reduce noise on several performance test suites run on EC2 instances. After thinking about our assumptions and the experiment setup, we began by recording data about our current setup and found no evidence of particularly good or bad EC2 instances. In the next step, we investigated IO and found that EBS instances are the stable option for us. Having found a very stable behavior as far as disks were concerned, this third and final experiment turns to tuning CPU related knobs to minimize noise from this part of the system.
Filtered by Tag: testing
In an effort to improve repeatability, the MongoDB Performance team set out to reduce noise on several performance test suites run on EC2 instances. At the beginning of the project, it was unclear whether our goal of running repeatable performance tests in a public cloud was achievable. Instead of debating the issue based on assumptions and beliefs, we decided to measure noise itself and see if we could make configuration changes to minimize it.
After thinking about our assumptions and the experiment setup, we began by recording data about our current setup.
On the MongoDB Performance team, we use EC2 to run daily system performance tests. After building a continuous integration system for performance testing, we realized that there were sources of random variation in our platform and system configuration which made a lot of our results non-reproducible. The run to run variation from the platform was bigger than the changes in MongoDB performance that we wanted to capture. To reduce such variation - environmental noise - from our test configuration, we set out on a project to measure and control for the EC2 environments on which we run our tests.
What do you do with a third-party tool that proves your application lacks a feature? Add that tool to your continuous integration system (after adding the feature, of course)! In our case we have added linearizable reads to MongoDB 3.4 and use Jepsen to test it.
What is Linearizability?
Linearizability is a property of distributed systems first introduced by Herlihy & Wing in their July 1990 article "Linearizability: a correctness condition for concurrent objects" (ACM Transactions on Programming Languages and Systems Journal). Peter Bailis probably provides the most accessible explanation of linearizability: "writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write."
Fuzz testing is a method for subjecting a codebase to a tide of hostile input to supplement the test cases engineers create on their own. In part one of this pair, we looked at the hybrid nature of our fuzzer -- how it combines “smart” and “dumb” fuzzing to produce input random enough to provoke bugs, but structured enough to pass input validation and test meaningful codepaths. To wrap up, I’ll discuss how we isolate signal from the noise a fuzzer intrinsically produces, and the tooling that augments the root cause analyses we do when the fuzzer finds an issue.
In part one of two, we examine how our fuzzer hybridizes the two main types of fuzzing to achieve greater coverage than either method alone could accomplish. Part two will focus on the pragmatics of running the fuzzer in a production setting and distilling a root cause from the complex output fuzz tests often produce.
What's a fuzzer?
Fuzzing, or fuzz testing, is a technique of generating randomized, unexpected, and invalid inputs to a program to trigger untested code paths. Fuzzing was originally developed in the 1980s and has since proven to be effective at ensuring the stability of a wide range of systems, from filesystems to distributed clusters to browsers. As people attempt to make fuzzing more effective, two philosophies have emerged: smart, and dumb fuzzing. And as the state of the art evolves, the techniques that are used to implement fuzzers are being partitioned into categories, chief among them being “generational” and “mutational.” In many popular fuzzing tools, smart fuzzing corresponds to generational techniques, and dumb fuzzing to mutational techniques, but as we will see, this is not an intrinsic relationship. Indeed, in our case, the situation is precisely reversed.
At MongoDB we write open source database drivers in ten programming languages. We also help developers in our community replicate our drivers' behavior in even more (and more exotic) languages. Ideally, all drivers behave alike; or, where they differ, the differences are written down and justified. How can we herd all these cats along the same path?
For years we failed. Each false start at standardization left us more discouraged. But we’ve recently gained momentum on standardizing our drivers. Human-readable, machine-testable specs, coded in YAML, prove which code conforms and which does not. These YAML tests are the Cat-Herd's Crook: a tool to guide us all in the same direction.