ansaurus

Question

int, short, byte performance in back-to-back for-loops

Answer 1

+5 A:

The majority of this time is probably spent writing to the console. Try doing something other than that in the loop...

Additionally:

Using DateTime.Now is a bad way of measuring time. Use System.Diagnostics.Stopwatch instead
Once you've got rid of the Console.WriteLine call, a loop of 127 iterations is going to be too short to measure. You need to run the loop lots of times to get a sensible measurement.

Here's my benchmark:

using System;
using System.Diagnostics;

public static class Test
{    
    const int Iterations = 100000;

    static void Main(string[] args)
    {
        Measure(ByteLoop);
        Measure(ShortLoop);
        Measure(IntLoop);
        Measure(BackToBack);
        Measure(DelegateOverhead);
    }

    static void Measure(Action action)
    {
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < Iterations; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine("{0}: {1}ms", action.Method.Name,
                          sw.ElapsedMilliseconds);
    }

    static void ByteLoop()
    {
        for (byte index = 0; index < 127; index++)
        {
            index.ToString();
        }
    }

    static void ShortLoop()
    {
        for (short index = 0; index < 127; index++)
        {
            index.ToString();
        }
    }

    static void IntLoop()
    {
        for (int index = 0; index < 127; index++)
        {
            index.ToString();
        }
    }

    static void BackToBack()
    {
        for (byte index = 0; index < 127; index++)
        {
            index.ToString();
        }
        for (short index = 0; index < 127; index++)
        {
            index.ToString();
        }
        for (int index = 0; index < 127; index++)
        {
            index.ToString();
        }
    }

    static void DelegateOverhead()
    {
        // Nothing. Let's see how much
        // overhead there is just for calling
        // this repeatedly...
    }
}

And the results:

ByteLoop: 6585ms
ShortLoop: 6342ms
IntLoop: 6404ms
BackToBack: 19757ms
DelegateOverhead: 1ms

(This is on a netbook - adjust the number of iterations until you get something sensible :)

That seems to show it making basically no significant different which type you use.

Jon Skeet 2010-04-07 16:12:30

but all the loops are writing to the console the same number of times, ie 127 x n-loops

runrunraygun 2010-04-07 16:14:22

although i guess the int.toString() could take longer than byte.toString() maybe?

runrunraygun 2010-04-07 16:15:14

@runrunraygun: `Console.WriteLine` is an async operation with undependable execution time. While it's not exceedingly likely that it would have a dramatic effect on your results, use something more reliable. In addition, `int.ToString()` is not the same function as `byte.ToString()`, so you're not performing the same action in each loop.

Adam Robinson 2010-04-07 16:16:03

@Adam: I've kept the int.ToString vs byte.ToString() distinction in my benchmark, but removed the Console.WriteLine call. So this is testing "looping with int and converting int to string" with "looping with short and converting short to string" etc.

Jon Skeet 2010-04-07 16:23:49

@Jon: I can't imagine that it actually *matters*, but there's really no need to deal with the loop variable within the measurement loops, is there? You could just as easily have an `int` variable in every function that you call `ToString()` on within the loop.

Adam Robinson 2010-04-07 16:45:03

Answer 2

+7 A:

First of all, it's not .NET that's optimized for int performance, it's the machine that's optimized because 32 bits is the native word size (unless you're on x64, in which case it's long or 64 bits).

Second, you're writing to the console inside each loop - that's going too be far more expensive than incrementing and testing the loop counter, so you're not measuring anything realistic here.

Third, a byte has range up to 255, so you can loop 254 times (if you try to do 255 it will overflow and the loop will never end - but you don't need to stop at 128).

Fourth, you're not doing anywhere near enough iterations to profile. Iterating a tight loop 128 or even 254 times is meaningless. What you should be doing is putting the byte/short/int loop inside another loop that iterates a much larger number of times, say 10 million, and check the results of that.

Finally, using DateTime.Now within calculations is going to result in some timing "noise" while profiling. It's recommended (and easier) to use the Stopwatch class instead.

Bottom line, this needs many changes before it can be a valid perf test.

Here's what I'd consider to be a more accurate test program:

class Program
{
    const int TestIterations = 5000000;

    static void Main(string[] args)
    {
        RunTest("Byte Loop", TestByteLoop, TestIterations);
        RunTest("Short Loop", TestShortLoop, TestIterations);
        RunTest("Int Loop", TestIntLoop, TestIterations);
        Console.ReadLine();
    }

    static void RunTest(string testName, Action action, int iterations)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < iterations; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine("{0}: Elapsed Time = {1}", testName, sw.Elapsed);
    }

    static void TestByteLoop()
    {
        int x = 0;
        for (byte b = 0; b < 255; b++)
            ++x;
    }

    static void TestShortLoop()
    {
        int x = 0;
        for (short s = 0; s < 255; s++)
            ++x;
    }

    static void TestIntLoop()
    {
        int x = 0;
        for (int i = 0; i < 255; i++)
            ++x;
    }
}

This runs each loop inside a much larger loop (5 million iterations) and performs a very simple operation inside the loop (increments a variable). The results for me were:

Byte Loop: Elapsed Time = 00:00:03.8949910
Short Loop: Elapsed Time = 00:00:03.9098782
Int Loop: Elapsed Time = 00:00:03.2986990

So, no appreciable difference.

Also, make sure you profile in release mode, a lot of people forget and test in debug mode, which will be significantly less accurate.

Aaronaught 2010-04-07 16:16:17

Ooh thanks, I've never really tried profiling my code before. Good points, taken on board :)

runrunraygun 2010-04-07 16:20:02

@Aaronaught: I love how similar our benchmarks are :)

Jon Skeet 2010-04-07 16:24:47

@Jon: I swear I didn't copy yours. :P

Aaronaught 2010-04-07 16:26:01

@Aaronaught: Oh I wasn't thinking of that at all. Just amused.

Jon Skeet 2010-04-07 16:33:43

Not a great deal to separate the answers, so the popular vote takes it by a nose. Cheers guys.

runrunraygun 2010-04-07 16:47:41

Answer 3

A:

Profiling .Net code is very tricky because the run-time environment the compiled byte-code runs in can be doing run-time optimisations on the byte code. In your second example, the JIT compiler probably spotted the repeated code and created a more optimised version. But, without any really detailed description of how the run-time system works, it's impossible to know what is going to happen to your code. And it would be foolish to try and guess based on experimentation since Microsoft are perfectly within their rights to redesign the JIT engine at any time provided they don't break any functionality.

Skizz 2010-04-07 16:32:06

Running the code within the debugger (or, more accurately, compiling and running under the default settings for the Debug profile that a VS project is created with) all but eliminates the possibility of the sort of optimization you're talking about.

Adam Robinson 2010-04-07 16:47:31

@Adam: But who'd run code under a debugger. I've noticed that in VS2005 the code runs much slower within the debugger than stand alone. IIRC, someone here mentioned that the output of the debug .net compiler and the release .net compiler were nearly identical and it was the fact the code was being run stand-alone as opposed to within the debugger that made the difference.

Skizz 2010-04-07 23:11:55

@Skizz: Disabling optimizations (which is done by default in the Debug configuration) is specifically what eliminates the sort of "optimizing away" that you're talking about. Attaching *any* debugger can have a negative effect on performance, but that's a different issue. The outputs of the compiler with optimizations enabled is, indeed, different from the output with optimizations disabled.

Adam Robinson 2010-04-08 04:11:39

Answer 4

+1 A:

I tried out the two programs above as they looked like they would produce different and possibly conflicting results on my dev machine.

Outputs from Aaronaughts' test harness

Short Loop: Elapsed Time = 00:00:00.8299340
Byte Loop: Elapsed Time = 00:00:00.8398556
Int Loop: Elapsed Time = 00:00:00.3217386
Long Loop: Elapsed Time = 00:00:00.7816368

ints are much quicker

Outputs from Jon's

ByteLoop: 1126ms
ShortLoop: 1115ms
IntLoop: 1096ms
BackToBack: 3283ms
DelegateOverhead: 0ms

nothing in it

Jon has the big fixed constant of calling tostring in the results which may be hiding the possible benefits that could occur if the work done in the loop was less. Aaronaught is using a 32bit OS which dosen't seem to benefit from using ints as much as the x64 rig I am using.

Hardware / Software Results were collected on a Core i7 975 at 3.33GHz with turbo disabled and the core affinity set to reduce impact of other tasks. Performance settings all set to maximum and virus scanner / unnecessary background tasks suspended. Windows 7 x64 ultimate with 11 GB of spare ram and very little IO activity. Run in release config built in vs 2008 without a debugger or profiler attached.

Repeatability Originally repeated 10 times changing order of execution for each test. Variation was negligible so i only posted my first result. Under max CPU load the ratio of execution times stayed consistent. Repeat runs on multiple x64 xp xeon blades gives roughly same results after taking into account CPU generation and Ghz

Profiling Redgate / Jetbrains / Slimtune / CLR profiler and my own profiler all indicate that the results are correct.

Debug Build Using the debug settings in VS gives consistent results like Aaronaught's.

Steve 2010-04-07 19:45:10

I'm running an x64 box. That's a pretty anomalous result for the first test - it looks like the `short` and `byte` versions took a lot longer than they should have, while the `int` version was very close to mine. Did you run the test a few times? Did you have anything else running at the same time?

Aaronaught 2010-04-07 20:27:47

Have you tried re-ordering the short-byte-int loops to see if there's any difference? Just in case the JIT compiler is deciding that a third loop might be worth optimising as it appears to be a common operation. Just a thought. Would be interesting to see.

Skizz 2010-04-07 23:14:13

@Aaronaught Switching my config to x86 dlls evened out my results. Thats why i presumed you were using a 32 bit operating system.

Steve 2010-04-12 18:51:32

@Skizz I ruled that out early with multiple out of order runs. See edits to my post

Steve 2010-04-12 19:04:06

Answer 5

A:

Console write has zero to do with actual performance of the data. It has more to do with the interaction with the console library calls. Suggest you do something interesting inside those loops that is data size independant.

Suggestions: bit shifts, multiplies, array manipulation, addition, many others...

drewk 2010-04-08 04:58:09

ansaurus

tags:

views:

answers:

int, short, byte performance in back-to-back for-loops

related questions