views:

196

answers:

4

Sorry, it's a long one, but I'm just explaining my train of thought as I analyze this. Questions at the end.

I have an understanding of what goes into measuring running times of code. It's run multiple times to get an average running time to account for differences per run and also to get times when the cache was utilized better.

In an attempt to measure running times for someone, I came up with this code after multiple revisions.

In the end I ended up with this code which yielded the results I intended to capture without giving misleading numbers:

// implementation C
static void Test<T>(string testName, Func<T> test, int iterations = 1000000)
{
    Console.WriteLine(testName);
    Console.WriteLine("Iterations: {0}", iterations);
    var results = Enumerable.Repeat(0, iterations).Select(i => new System.Diagnostics.Stopwatch()).ToList();
    var timer = System.Diagnostics.Stopwatch.StartNew();
    for (int i = 0; i < results.Count; i++)
    {
        results[i].Start();
        test();
        results[i].Stop();
    }
    timer.Stop();
    Console.WriteLine("Time(ms): {0,3}/{1,10}/{2,8} ({3,10})", results.Min(t => t.ElapsedMilliseconds), results.Average(t => t.ElapsedMilliseconds), results.Max(t => t.ElapsedMilliseconds), timer.ElapsedMilliseconds);
    Console.WriteLine("Ticks:    {0,3}/{1,10}/{2,8} ({3,10})", results.Min(t => t.ElapsedTicks), results.Average(t => t.ElapsedTicks), results.Max(t => t.ElapsedTicks), timer.ElapsedTicks);
    Console.WriteLine();
}

Of all the code I've seen that measures running times, they were usually in the form:

// approach 1 pseudocode
start timer;
loop N times:
    run testing code (directly or via function);
stop timer;
report results;

This was good in my mind since with the numbers, I have the total running time and can easily work out the average running time and would have good cache locality.

But one set of values that I thought were important to have were minimum and maximum iteration running time. This could not be calculated using the above form. So when I wrote my testing code, I wrote them in this form:

// approach 2 pseudocode
loop N times:
    start timer;
    run testing code (directly or via function);
    stop timer;
    store results;
report results;

This is good because I could then find the minimum, maximum as well as average times, the numbers I was interested in. Until now I realized that this could potentially skew results since the cache could potentially be affected since the loop wasn't very tight giving me less than optimal results.


The way I wrote the test code (using LINQ) added additional overheads which I knew about but ignored since I was just measuring the running code, not the overheads. Here was my first version:

// implementation A
static void Test<T>(string testName, Func<T> test, int iterations = 1000000)
{
    Console.WriteLine(testName);
    var results = Enumerable.Repeat(0, iterations).Select(i =>
    {
        var timer = System.Diagnostics.Stopwatch.StartNew();
        test();
        timer.Stop();
        return timer;
    }).ToList();
    Console.WriteLine("Time(ms): {0,3}/{1,10}/{2,8}", results.Min(t => t.ElapsedMilliseconds), results.Average(t => t.ElapsedMilliseconds), results.Max(t => t.ElapsedMilliseconds));
    Console.WriteLine("Ticks:    {0,3}/{1,10}/{2,8}", results.Min(t => t.ElapsedTicks), results.Average(t => t.ElapsedTicks), results.Max(t => t.ElapsedTicks));
    Console.WriteLine();
}

Here I thought this was fine since I'm only measuring the times it took to run the test function. The overheads associated with LINQ are not included in the running times. To reduce the overhead of creating timer objects within the loop, I made the modification.

// implementation B
static void Test<T>(string testName, Func<T> test, int iterations = 1000000)
{
    Console.WriteLine(testName);
    Console.WriteLine("Iterations: {0}", iterations);
    var results = Enumerable.Repeat(0, iterations).Select(i => new System.Diagnostics.Stopwatch()).ToList();
    results.ForEach(t =>
    {
        t.Start();
        test();
        t.Stop();
    });
    Console.WriteLine("Time(ms): {0,3}/{1,10}/{2,8} ({3,10})", results.Min(t => t.ElapsedMilliseconds), results.Average(t => t.ElapsedMilliseconds), results.Max(t => t.ElapsedMilliseconds), results.Sum(t => t.ElapsedMilliseconds));
    Console.WriteLine("Ticks:    {0,3}/{1,10}/{2,8} ({3,10})", results.Min(t => t.ElapsedTicks), results.Average(t => t.ElapsedTicks), results.Max(t => t.ElapsedTicks), results.Sum(t => t.ElapsedTicks));
    Console.WriteLine();
}

This improved overall times but caused a minor problem. I added the total running time in the report by adding each iteration's times but gave misleading numbers since the times were short and didn't reflect the actual running time (which was usually much longer). I needed to measure the time of the entire loop now so I moved away from LINQ and ended up with the code I have now at the top. This hybrid gets the the times I think are important with minimal overhead AFAIK. (starting and stopping the timer just queries the high resolution timer) Also any context switching occurring is unimportant to me as it's part of normal execution anyway.

At one point, I forced the thread to yield within the loop to make sure that it is given the chance at some point at a convenient time (if the test code is CPU bound and doesn't block at all). I'm not too concerned about the processes running which might change the cache for the worse since I would be running these tests alone anyway. However, I came to the conclusion that for this particular case, it was unnecessary to have. Though I might incorporate it in THE final final version if it proves beneficial in general. Perhaps as an alternate algorithm for certain code.


Now my questions:

  • Did I make some right choices? Some wrong ones?
  • Did I make wrong assumptions about the goals in my thought process?
  • Would the minimum or maximum running times really be useful information to have or is it a lost cause?
  • If so, which approach would be better in general? The time running in a loop (approach 1)? Or the time running just the code in question (approach 2)?
  • Would my hybrid approach be ok to use in general?
  • Should I yield (for reasons explained in the last paragraph) or is that more harm to the times than necessary?
  • Is there a more preferred way to do this that I did not mention?

Just to be clear, I'm not looking for an all-purpose, use anywhere, accurate timer. I just want to know of an algorithm that I should use when I want a quick to implement, reasonably accurate timer to measure code when a library or other 3rd party tools is not available.

I'm inclined to write all my test code in this form should there be no objections:

// final implementation
static void Test<T>(string testName, Func<T> test, int iterations = 1000000)
{
    // print header
    var results = Enumerable.Repeat(0, iterations).Select(i => new System.Diagnostics.Stopwatch()).ToList();
    for (int i = 0; i < 100; i++) // warm up the cache
    {
        test();
    }
    var timer = System.Diagnostics.Stopwatch.StartNew(); // time whole process
    for (int i = 0; i < results.Count; i++)
    {
        results[i].Start(); // time individual process
        test();
        results[i].Stop();
    }
    timer.Stop();
    // report results
}
A: 

The logic in Approach 2 feels 'righter' to me, but I'm just a CS student.

I came across this link that you might find of interest: http://www.yoda.arachsys.com/csharp/benchmark.html

Quinn1000
I've always thought so too but I always see it in the form of approach 1 by other developers. So it is either a better approach or they didn't think to try approach 2 (or 3). My observations on the numbers trying these different approaches makes it unclear to me. At least I know I'm not alone on this. :) And thanks for the link. Not really the kind of thing I'm looking for but it's definitely appreciated. And it reminded me of the word I intended to use for a tag: [benchmark]. ;)
Jeff M
A: 

Depending on what the running time of the code you're testing is, it's quite difficult to measure the individual runs. If the runtime of the code your testing is multiple seconds, your approach of timing the specific run will most likely not be a problem. If it's in the vicinity of milliseconds, your results will probably very too much. If you e.g. have a context switch or a read from the swap file at the wrong moment, the runtime of that run will be disproportionate to the average runtime.

Pieter
What do you suppose is a good threshold for the minimum running time of the test code? I know the test code should run at least a few milliseconds consistently at the minimum. I had always aimed for at least 10.
Jeff M
+1  A: 

I think your first code sample seems like the best approach.

Your first code sample is small, clean and simple and doesn't use any major abstractions during the test loop which may introduce hidden overhead.

Use of the Stopwatch class is a good thing as it simplifies the code one normally has to write to get high-resolution timings.

One thing you might consider is providing the option to iterate the test for a smaller number of times untimed before entering the timing loop to warm up any caches, buffers, connections, handles, sockets, threadpool threads etc. that the test routine may exercise.

HTH.

bitcrazed
Sorry, I should have labeled the pieces of code just in case any one referred to it. Were you referring to implementation C (the first code sample listed in the question) or implementation A (the first code attempt)?
Jeff M
I have seen the small untimed loop once and thought it was a good idea. I think I'll do that too from now on.
Jeff M