The MongoDB Engineering Journal

A tech blog for builders, by builders

About the Author


	    			
Name
:
"Robert Guo"
Title
:
"Software Engineer"
Email
:
"robert.guo@mongodb.com"
Bio
:
"

Robert Guo is a software engineer on the MongoDB server team focusing on data consistency and correctness. He enjoys playing tennis and snowboarding in his spare time.

"
Posts by Robert Guo
  • MongoDB’s JavaScript Fuzzer: Harnessing the Havoc (2/2)

    Fuzz testing is a method for subjecting a codebase to a tide of hostile input to supplement the test cases engineers create on their own. In part one of this pair, we looked at the hybrid nature of our fuzzer – how it combines “smart” and “dumb” fuzzing to produce input random enough to provoke bugs, but structured enough to pass input validation and test meaningful codepaths. To wrap up, I’ll discuss how we isolate signal from the noise a fuzzer intrinsically produces, and the tooling that augments the root cause analyses we do when the fuzzer finds an issue.

    An unbridled fuzzer creates too much noise

    Fuzz testing is a game of random numbers. That randomness makes the fuzzer powerful… too powerful. Without some careful harnessing, it would just blow itself up all the time by creating errors within the testing code itself. Take the following block of code, which is something you would see in one of MongoDB’s JavaScript tests:

    while(coll.count() < 654321)
        assert(coll.update({a:1}, {$set: {...}}))
    

    This code does a large number of updates to a document stored in MongoDB. If we were to put it through the fuzzer, a possible test-case that the fuzzer could produce is this:

    while(true) 
        assert(coll.update({}, {$set: {"a.654321" : 1}}))
    

    The new code now tests something completely different. It tries to set the 654321th element in an array stored in all documents in some MongoDB collection.

    Now, this is an interesting test-case. Using the $set operator with such a large array may not be something we thought of testing explicitly and could trigger a bug (in fact it does). But the interaction between the fuzzed true condition and the residual while loop is going to hang the test! Unless, that is, the assert call in the while loop fails, which could happen if the line defining coll in the original test (not shown here) is mutated or deleted by the fuzzer, leaving coll undefined. If the assert call fails, it would be caught by the Mongo shell and cause it to terminate.

    But neither the hang nor the assertion failure are caused by bugs in MongoDB. They are just byproducts of a randomly generated test-case, and they represent the two classes of noise we have to filter out of our fuzz testing: branch logic and assertion failures.

    Read More
  • MongoDB’s JavaScript Fuzzer: Creating Chaos (1/2)

    As MongoDB becomes more feature rich and complex with time, our need for more sophisticated bug-finding methods grows as well. We recently added a homegrown JavaScript fuzzer to our toolkit, and it is now our most prolific bug finding tool, responsible for finding almost 200 bugs over the course of two release cycles. These bugs span a range of MongoDB components from sharding to the storage engine, with symptoms ranging from deadlocks to data inconsistency. We run the fuzzer as part of our continuous integration system, Evergreen, where it frequently catches bugs in newly committed code.

    In part one of two, we examine how our fuzzer hybridizes the two main types of fuzzing to achieve greater coverage than either method alone could accomplish. Part two will focus on the pragmatics of running the fuzzer in a production setting and distilling a root cause from the complex output fuzz tests often produce.

    What’s a fuzzer?

    Fuzzing, or fuzz testing, is a technique of generating randomized, unexpected, and invalid inputs to a program to trigger untested code paths. Fuzzing was originally developed in the 1980s and has since proven to be effective at ensuring the stability of a wide range of systems, from filesystems to distributed clusters to browsers. As people attempt to make fuzzing more effective, two philosophies have emerged: smart, and dumb fuzzing. And as the state of the art evolves, the techniques that are used to implement fuzzers are being partitioned into categories, chief among them being “generational” and “mutational.” In many popular fuzzing tools, smart fuzzing corresponds to generational techniques, and dumb fuzzing to mutational techniques, but as we will see, this is not an intrinsic relationship. Indeed, in our case, the situation is precisely reversed.

    Read More

Copyright © 2016 MongoDB, Inc.
Mongo, MongoDB, and the MongoDB leaf logo are registered trademarks of MongoDB, Inc.

Powered by Hugo