Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Using open source tools for performance testing

2006 Test

Goranka Bjedov (from Google) gave one of the most entertaining presentations of the conference to date on performance testing, and using open-source tools in order to obtain the data and deriving results from those tests. A useful primer on the terminology of performance testing was pretty useful:

Performance test
How the system handles under a given load
Stress test
At what point the system breaks under load
Load test
How does the system handle under heavy loads
Benchmark test
Simple, measurable and reproducible test to give an overall indication of performance or specific aspects of performance
Reliability/longevity test
Test that it meets product requirements over a length of service
Availability test
Testing how long the system takes to recover after a failure

Of course, the reason for performing such performance tests is not only to generate marketing data (and there's a great set of quotes about that in the video) but to ensure that the system meets the contractual requirements that may need to be met. Of course, it also helps with ensuring that the system does what it expect, and although not explicitly meant for the purpose, can also help show when there are memory leaks through general deterioration over time.

In order to test systems under load, you really need to have a dedicated set of machines to test on (obviously), but also machines that are able to record the data. In order to ensure that the measurements take the least amount of time possible, a separate machine/set of machines should be set up to record data, and then the machines that are hosting the actual tests can periodically report performance metrics.

Generating data takes up a lot of space; and of course, there's a distinction between raw data and information. So clearly, there need to be reporting systems and analysers that can distil important aspects (averaging, confidence ranges etc.). This may not happen in real time, and often there will be different machines that do the reporting of these features (or even reuse the hardware of the system under test when it's no longer running). One important fact that came out of this is that when running under normal conditions, there's no point in exceeding 75-80% of utilisation on the machine; if you hit 100% all the time, then you're much more likely to trigger conditions that will cause otherwise unexplained pauses (log rotations, swapping, etc.) Unless what you're doing is a stress test, then loading up to 100% utilisation will not achieve much or give you repeatable figures that you can try again. And obviously, since each test run is different, you'll need to repeat this process 3-5 times to be able to take an average.

In terms of the tools available; there are advantages and disadvantages of using a variety of different setups. Goranka talked about the advantages and disadvantages of using in-house, commercial and open-source solutions:

  • Frequently available as a side-effect of development processes
  • Easy to use
  • Source code available
  • Cannot be shared with customers
  • Validity of results questionable outside of the company
  • Tool maintenance can be costly and tedious (often more expensive than vendor products)
  • Support many protocols
  • Professionally developed and maintained
  • Results and scripts can be shared and verified
  • Pretty graphics to appeal to people who are important
  • (Extremely) expensive
  • Proprietary scripting language
  • No access to source code
Open Source
  • Good price (purchase and maintenance)
  • Easy to share results
  • Source code available
  • Scripting in standard programming languages
  • Support for different protocols limited
  • Steeper learning curve
  • May be cost in working around bugs
  • Getting people who know how to use it is a bonus

Some of the examples of open-source products include OpenSTA (though it's Windows only), Grinder and JMeter. One benefit of JMeter is that it can run with a UI, as well as being able to be invoked on a command line to run tests.

So, the process involved in setting up a performance testing process looks like:

  1. Setup a realistic environment (whole set; hardware, software, etc.). Also note that results on small systems cannot be extrapolated to large systems by simple scaling of numbers.
  2. Execute stress test. Maximum load support is probably about 80% of the system utilisation. When stressed, only the latency of requests should be affected.
  3. Execute performance test. Test should be long enough to reach steady state, after a suitable warm-up period. During the test, the latency and throughput data should be collected. The reported system performance should be the lowest of the recorded values; that's the only level that you can guarantee (but bear in mind that therefore the average will actually be higher than this). The test should be repeated several times (3-5) to ensure statistical consistency.

There were also many notable quotes (some of which I've probably missed out on) that came out during the talk; the video will be well worth watching. Here's a selection of some of them:

  • Developers are over-confident in their software
  • It will fail where it cannot possibly fail
  • Performance tests shouldn't be used to find memory leaks (but may reveal them)
  • There is no point doing performance tests without monitoring
  • Marketing: if they want maximum numbers, come up with a technical reason why they can't quote them that they won't understand
  • Marketing: which planet do they come from, and why does everyone on that planet have the same haircut?
  • Interpolation is great; extrapolation will kill you

I may have transcribed some of those quotes incorrectly ... any errors are my own. Oh, and the book that she was talking about is "Optimizing-Linux-Performance-Hands-On-Guide" (ISBN 0131486829), on Amazon. The video is now available.