views:

124

answers:

3

I am getting ready to perform a series of performance comparisons of various of the shelf products.

What do I need to do to show credibility in the tests? How do I design my benchmark tests so that they are respectable?

I am also interested in any suggestions on the actual design of the tests. Ways to load data without effecting the tests (Heisenberg Uncertainty Principle), or ways to monitor... etc

A: 

Why do you care about the performance? In both cases the time taken to write the message to wherever you a storing your log will be a lot slower than anything else.

If you are really doing that match logging, then you are likely to need to index your log files so you can find the log entry you need, at that point you are not doing standard logging.

Ian Ringrose
This isn't very helpful to my question. But* systems need auditing and you dont want to be slower because of a logging service. Just because it says log doesn't mean it really has to be a file. Some use DB, Trace, Event Logs, and others use files.
Nix
I tend to think that auditing should be build into the database schema so you can report and search on it. A "write only" audit is not of much use!
Ian Ringrose
+2  A: 

This is a bit tricky to answer without knowing what sort of "off the shelf" products you are trying to assess. Are you looking for UI responsiveness, throughput (e.g. email, transactions/sec), startup time, etc - all of these have different criteria for what measures you should track and different tools for testing or evaluating. But to answer some of your general questions:

  1. Credibility - this is important. Try to make sure that whatever you are measuring has little run to run variance. Utilize the technique of doing several runs of the same scenario, get rid of outliers (i.e. your lowest and highest), and evaluate your avg/max/min/median values. If you're doing some sort of throughput test, consider making it long running so you have a good sample set. For example, if you are looking at something like Microsoft Exchange and thus are using their perf counters, try to make sure you are taking frequent samples (once per sec or every few secs) and have the test run for 20mins or so. Again, chop off the first few mins and the last few mins to eliminate any startup/shutdown noise.

  2. Heisenburg - tricky. In most modern systems, depending on what application/measures you are measuring, you can minimize this impact by being smart about what/how you are measuring. Sometimes (like in the Exchange example), you'll see near 0 impact. Try to use as least invasive tools as possible. For example, if you're measuring startup time, consider using xperfinfo and utilize the events built into the kernel. If you're using perfmon, don't flood the system with extraneous counters that you don't care about. If you're doing some exteremely long running test, ratchet down your sampling interval.

Also try to eliminate any sources of environment variability or possible sources of noise. If you're doing something network intensive, consider isolating the network. Try to disable any services or applications that you don't care about. Limit any sort of disk IO, memory intensive operations, etc. If disk IO might introduce noise in something that is CPU bound, consider using SSD.

When designing your tests, keep repeatability in mind. If you doing some sort of microbenchmark type testing (e.g. perf unit test) then have your infrastructure support running the same operation n times exactly the same. If you're driving UI, try not to physically drive the mouse and instead use the underlying accessibility layer (MSAA, UIAutomation, etc) to hit controls directly programmatically.

Again, this is just general advice. If you have more specifics then I can try to follow up with more relavant guidance.

Enjoy!

nithins
I was going to start with log4net and enterprise logging. But there are others i would like to compare as well.
Nix
A: 

Your question is very interesting, but a bit vague, because without knowing what to test it is not easy to give you some clues.

You can test performance from many different angles, then, depending on the use or target of the library you should try one approach or another; I will try to enumerate some of the things you may have to consider for measurement:

  • Multithreading: if the library uses it or your software will use the library in a multithreaded context then you may have to test it with many different processor and multiprocessor configurations to see how it reacts.
  • Startup time: its importance depends on how intensively will you use the library and what’s the nature of the product being built with it (client, server …).
  • Response time: for this do not take the first execution, try to execute the same call many times after the first one and do an average. Using System.Diagnostics.StopWatch could be very useful for that.
  • Memory consumption: analyze the growth, beware of exponential ones ;). Go a step further and measure quantity of objects being created and disposed.
  • Responsiveness: you should not only measure raw performance, how the user feels the speed of the product it is very important too.
  • Network: if the library uses resources on the network you may have to test it with different bandwidth and latency configurations, there is software to simulate these situations.
  • Data: try to create many different testing data packages, trying to cover, for example: a big bunch of raw data, then a large set made of many smaller chunks, a long iteration with small pieces of data, …

Tools:

  • System.Diagnostics.Stopwatch: essential for benchmarking method calls
  • Performance counters: whenever available they are very useful to know what’s happening inside, allowing you to monitor the software without affecting its performance.
  • Profilers: there are some good memory and performance profilers in the market, but as you said, they always affect the measurements. They are good for finding bottlenecks in your software, but I don’t think you can use them for a comparison test.
jmservera