Repeatable performance tests: EBS instances are the stable option

| |

In an effort to improve repeatability, the MongoDB Performance team set out to reduce noise on several performance test suites run on EC2 instances. At the beginning of the project, it was unclear whether our goal of running repeatable performance tests in a public cloud was achievable. Instead of debating the issue based on assumptions and beliefs, we decided to measure noise itself and see if we could make configuration changes to minimize it.

After thinking about our assumptions and the experiment setup, we began by recording data about our current setup and found no evidence of particularly good or bad EC2 instances. However, we found that the results of repeated tests had a high variance. Given our test data and our knowledge of the production system, we had observed that many of the noisiest tests did the most IO (being either sensitive to IO latency, or bandwidth). After performing the first baseline tests, we therefore decided to focus on IO performance through testing both AWS instance types and IO configuration on those instances.

Investigate IO

As we are explicitly focusing on IO in this step, we added an IO specific test (fio) to our system. This allows us to isolate the impact of IO noise to our existing benchmarks. The IO specific tests focus on:

  • Operation latency
  • Streaming bandwidth
  • Random IOPs (IO per second)

We look first at the IO specific results, and then our general MongoDB benchmarks. In the below graph, we are graphing the "noise" metric as a percentage computed from (max-min)/median and lower is better. c3.8xlarge with ephemeral storage is our baseline configuration which we were using in our production environment.

i2.8xlarge shows best results with low noise on throughput and latency

i2.8xlarge shows best results with low noise on throughput and latency

The IO tests show some very interesting results.

  1. The c3.8xlarge with EBS PIOPS shows less noise than the c3.8xlarge with its ephemeral disks. This was quite unexpected. In fact the c3.8xlarge with ephemeral storage (our existing configuration) is just about the worst choice.

  2. The i2.8xlarge looks best all around with low noise on throughput and latency.

  3. The c4.8xlarge shows higher latency noise than the c3.8xlarge. We would have expected any difference to favor the c4.8xlarge instances, as they are EBS optimized.

After these promising results, we examined the results of our MongoDB benchmarks next. At the time that we did this work, MongoDB had two storage engines (wiredTiger and MMAPv1), with MMAPv1 being the default, but now deprecated, option. There were differences in the results between the two storage engines, but they shared a common trend.

c3.8xlarge with PIOPS performs best with all results below 10% noise for the wiredTiger storage engine

c3.8xlarge with PIOPS performs best with all results below 10% noise for the wiredTiger storage engine

c3.8xlarge with PIOPS performs best with most results below 10% noise for the mmap storage engine

c3.8xlarge with PIOPS performs best with most results below 10% noise for the mmap storage engine

There were no configurations that were best across the board. That said, there was a configuration with below 10% noise for all but 1 test: c3.8xlarge with EBS PIOPS. Interestingly, while i2 was the best for our focused IO tests, it was not for our actual tests.

Valuable lessons learned:

  • As far as repeatable results are concerned, the "local" SSDs we had been using performed worse compared to any other alternative we could have possibly chosen!

  • Contrary to popular belief, when using Provisioned IOPS with EBS, the performance is both good in absolute terms, and very very stable! This is true for our IO tests and our general tests. The latency of disk requests does have more variability than the SSD alternatives, but the IOPS performance was super stable. For most of our tests, the latter is the important characteristic.

  • The i2 instance family has a much higher performance SSD, and in fio tests showed almost zero variability. It also happens to be a very expensive instance type. However, while this instance type was indeed a great choice in theory, it turns out that our MongoDB test results were quite noisy. Upon further investigation, we learned that the noisy results were due to unstable performance of MongoDB itself. As i2.8xlarge has more RAM than c3.8xlarge, MongoDB on i2.8xlarge is able to hold much more dirty data in RAM. Flushing that much dirty data to disk was causing issues.

Switching from ephemeral to EBS disks in production

Based on the above results, we changed our production configuration to run on EBS disks instead of ephemeral SSD. (We were already running on c3.8xlarge instance types, which turned out to have the lowest noise in the above comparison, so decided to keep using those.)

Performance becomes more stable when using EBS

Performance becomes more stable when using EBS

After running with the changes for a couple of weeks, you could clearly see how the day-to-day variation of test results decreased dramatically. This instantly made the entire System Performance project more useful to the development team and MongoDB as a whole.

Conclusion

Focusing on IO performance proved useful. As it turns out using Ephemeral (SSD) disks was just about the worst choice for our performance test. Instead, using Provisioned IOPS showed the most stable rate results. While i2 instances were the best in our non-MongoDB benchmark tests, they proved less than ideal in practice. This highlights quite clearly that you need to measure your actual system and assume nothing to get the best results.

This is the second of three bigger experiments we performed in our quest to reduce variability in performance tests on EC2 instances. You can read more about the top level setup and results as well as how we found out that EC2 instances are neither good nor bad and that CPU options are best disabled.

- David Daly & Henrik Ingo