tags:

views:

181

answers:

5

It seems that C# is faster at adding two arrays of UInt16[] than it is at adding two arrays of int[]. This makes no sense to me, since I would have assumed the arrays would be word-aligned, and thus int[] would require less work from the CPU, no?

I ran the test-code below, and got the following results:

Int    for 1000 took 9896625613 tick (4227 msec)
UInt16 for 1000 took 6297688551 tick (2689 msec)

The test code does the following:

  1. Creates two arrays named a and b, once.
  2. Fills them with random data, once.
  3. Starts a stopwatch.
  4. Adds a and b, item-by-item. This is done 1000 times.
  5. Stops the stopwatch.
  6. Reports how long it took.

This is done for int[] a, b and for UInt16 a,b. And every time I run the code, the tests for the UInt16 arrays take 30%-50% less time than the int arrays. Can you explain this to me?

Here's the code, if you want to try if for yourself:

public static UInt16[] GenerateRandomDataUInt16(int length)
{
    UInt16[] noise = new UInt16[length];
    Random random = new Random((int)DateTime.Now.Ticks);
    for (int i = 0; i < length; ++i)
    {
        noise[i] = (UInt16)random.Next();
    }

    return noise;
}

public static int[] GenerateRandomDataInt(int length)
{
    int[] noise = new int[length];
    Random random = new Random((int)DateTime.Now.Ticks);
    for (int i = 0; i < length; ++i)
    {
        noise[i] = (int)random.Next();
    }

    return noise;
}

public static int[] AddInt(int[] a, int[] b)
{
    int len = a.Length;
    int[] result = new int[len];
    for (int i = 0; i < len; ++i)
    {
        result[i] = (int)(a[i] + b[i]);
    }
    return result;
}

public static UInt16[] AddUInt16(UInt16[] a, UInt16[] b)
{
    int len = a.Length;
    UInt16[] result = new UInt16[len];
    for (int i = 0; i < len; ++i)
    {
        result[i] = (ushort)(a[i] + b[i]);
    }
    return result;
}


public static void Main()
{
    int count = 1000;
    int len = 128 * 6000;

    int[] aInt = GenerateRandomDataInt(len);
    int[] bInt = GenerateRandomDataInt(len);

    Stopwatch s = new Stopwatch();
    s.Start();
    for (int i=0; i<count; ++i) 
    {
        int[] resultInt = AddInt(aInt, bInt);
    }
    s.Stop();
    Console.WriteLine("Int    for " + count 
                + " took " + s.ElapsedTicks + " tick (" 
                + s.ElapsedMilliseconds + " msec)");

    UInt16[] aUInt16 = GenerateRandomDataUInt16(len);
    UInt16[] bUInt16 = GenerateRandomDataUInt16(len);

    s = new Stopwatch();
    s.Start();
    for (int i=0; i<count; ++i) 
    {
        UInt16[] resultUInt16 = AddUInt16(aUInt16, bUInt16);
    }
    s.Stop();
    Console.WriteLine("UInt16 for " + count 
                + " took " + s.ElapsedTicks + " tick (" 
                + s.ElapsedMilliseconds + " msec)");


}
+2  A: 

Arrays are word aligned, but there is no reason why entries in the array should be word aligned.

Dipstick
+1  A: 

Just a SWAG: the smaller memory use of the UInt16 arrays has improved memory characteristics (GC, cache, who knows what else). Since there doesn't seem to be too many allocations, I'd guess that cache is the main factor.

Also, you should take care that benchmarking can be a tricky business - it looks like your times are probably including some of the JIT compilation, which might be skewing results. You might try reversing the order that you test the int array with the UInt16 array and see if the timings follow along or not.

Jon Skeet has (or had) a simple benchmark framework he coded up way back when that tried to take these effects into account. I don't know if it's still available (or even applicable); maybe he'll comment.

Michael Burr
+5  A: 

What happens is that you see a leaky abstraction. UInt16 takes half of memory that int does (16 vs. 32 bit).

This means that the memory area occupied by int16 array takes half of area that the int32 does. So more of that area can fit in processor cache and thus be accessed very quickly.

You could try that code on a processor that has more cache and the difference is likely to be smaller.

Also try with much larger arrays.

Pasi Savolainen
Conversely, try it with smaller arrays that fit inside one cache line.
Konrad Rudolph
+1  A: 

Couple of factors

1) You are also timing the generation of the resultant array..so it would be interesting to see how much time it took to just to the adding vs creating the result array that gets passed back

2) It would be interesting to see what IL is generated. Since your code is VERY straightforward (iterate and add), the compiler might be optimizing this, maybe stuffing multiple uint16's in a larger register and doing multiple additions per instruction

puffpio
I checked it in Reflector, and that's not what's happening. The code is algorithmically virtually identical. All operations are the same but adjusted for their appropriate data types. The only significant difference is the addition of a `conv.u2` operation following the `add` in the `UInt16` case (`add` returns an int, I suppose - can't find documentation to back it up, though it stands to reason since that's the way C# works, too). If the difference were in the IL, I'd expect the `UInt16` version to be slower, thanks to that extra conversion. My bet's on the cache miss theory.
Dathan
+1  A: 

I am not expert in .NET but I would check two things:

  1. Passing larger array (N elements of type int) takes more time then array of N ushort elements. This can tested using various size of arrays and style of coding -- see my comment to question). Numbers from your tests fit this theory :).
  2. Adding two ushort variables can be implemented as adding two int with result of type int -- without checking overflowing. And I assume that handling in code any kind of exception (including overflow exception) is time consuming task. This can be checked in .NET documentation.
Grzegorz Gierlik
FYI, VS 2008 uses the `add` IL operation when compiling the above, and the [CIL spec](http://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20III.pdf) states that the `add` operation doesn't check for overflow.
Dathan