views:

141

answers:

3

Let's say I want to move a part of an array right by 1. I can either use Array.Copy or just make a loop copying elements one by one:

private static void BuiltInCopy<T>(T[] arg, int start) {
    int length = arg.Length - start - 1;
    Array.Copy(arg, start, arg, start + 1, length);
}

private static void ElementByElement<T>(T[] arg, int start) {
    for (int i = arg.Length - 1; i > start; i--) {
        arg[i] = arg[i - 1];
    }
}

private static void ElementByElement2<T>(T[] arg, int start) {
    int i = arg.Length - 1;
    while (i > start)
        arg[i] = arg[--i];
}

(ElementByElement2 was suggested by Matt Howells.)

I tested it using Minibench, and results surprised me quite a lot.

internal class Program {
    private static int smallArraySize = 32;

    public static void Main(string[] args) {
        BenchArrayCopy();
    }

    private static void BenchArrayCopy() {
        var smallArrayInt = new int[smallArraySize];
        for (int i = 0; i < smallArraySize; i++)
            smallArrayInt[i] = i;

        var smallArrayString = new string[smallArraySize];
        for (int i = 0; i < smallArraySize; i++)
            smallArrayString[i] = i.ToString();

        var smallArrayDateTime = new DateTime[smallArraySize];
        for (int i = 0; i < smallArraySize; i++)
            smallArrayDateTime[i] = DateTime.Now;

        var moveInt = new TestSuite<int[], int>("Move part of array right by 1: int")
            .Plus(BuiltInCopy, "Array.Copy()")
            .Plus(ElementByElement, "Element by element (for)")
            .Plus(ElementByElement2, "Element by element (while)")
            .RunTests(smallArrayInt, 0);

        var moveString = new TestSuite<string[], string>("Move part of array right by 1: string")
            .Plus(BuiltInCopy, "Array.Copy()")
            .Plus(ElementByElement, "Element by element (for)")
            .Plus(ElementByElement2, "Element by element (while)")
            .RunTests(smallArrayString, "0");

        moveInt.Display(ResultColumns.All, moveInt.FindBest());
        moveString.Display(ResultColumns.All, moveInt.FindBest());
    }

    private static T ElementByElement<T>(T[] arg) {
        ElementByElement(arg, 1);
        return arg[0];
    }

    private static T ElementByElement2<T>(T[] arg) {
        ElementByElement2(arg, 1);
        return arg[0];
    }

    private static T BuiltInCopy<T>(T[] arg) {
        BuiltInCopy(arg, 1);
        return arg[0];
    }

    private static void BuiltInCopy<T>(T[] arg, int start) {
        int length = arg.Length - start - 1;
        Array.Copy(arg, start, arg, start + 1, length);
    }

    private static void ElementByElement<T>(T[] arg, int start) {
        for (int i = arg.Length - 1; i > start; i--) {
            arg[i] = arg[i - 1];
        }
    }

    private static void ElementByElement2<T>(T[] arg, int start) {
        int i = arg.Length - 1;
        while (i > start)
            arg[i] = arg[--i];
    }
}

Note that allocations are not being measured here. All methods just copy array elements. Since I am on 32-bit OS, an int and a string reference take up the same amount of space on stack.

This is what I expected to see:

  1. BuiltInCopy should be the fastest for two reasons: 1) it can do memory copy; 2) List<T>.Insert uses Array.Copy. On the other hand, it's non-generic, and it can do a lot of extra work when arrays have different types, so perhaps it didn't take full advantage of 1).
  2. ElementByElement should be equally fast for int and string.
  3. BuiltInCopy should either be equally fast for int and string, or slower for int (in case it has to do some boxing).

However, all of these suppositions were wrong (at least, on my machine with .NET 3.5 SP1)!

  1. BuiltInCopy<int> is significantly slower than ElementByElement<int> for 32-element arrays. When size is increased, BuiltInCopy<int> becomes faster.
  2. ElementByElement<string> is over 4 times slower than ElementByElement<int>.
  3. BuiltInCopy<int> is faster than BuiltInCopy<string>.

Can anybody explain these results?

UPDATE: From a CLR Code Generation Team blog post on array bounds check elimination:

Advice 4: when you’re copying medium-to-large arrays, use Array.Copy, rather than explicit copy loops. First, all your range checks will be “hoisted” to a single check outside the loop. If the arrays contain object references, you will also get efficient “hoisting” of two more expenses related to storing into arrays of object types: the per-element “store checks” related to array covariance can often be eliminated by a check on the dynamic types of the arrays, and garbage-collection-related write barriers will be aggregated and become much more efficient. Finally, we will able to use more efficient “memcpy”-style copy loops. (And in the coming multicore world, perhaps even employ parallelism if the arrays are big enough!)

The last column is the score (total duration in ticks/number of iterations, normalized by the best result).

Two runs at smallArraySize = 32:

f:\MyProgramming\TimSort\Benchmarks\bin\Release>Benchmarks.exe
============ Move part of array right by 1: int ============
Array.Copy()               468791028 0:30.350 1,46
Element by element (for)   637091585 0:29.895 1,06
Element by element (while) 667595468 0:29.549 1,00

============ Move part of array right by 1: string ============
Array.Copy()               432459039 0:30.929 1,62
Element by element (for)   165344842 0:30.407 4,15
Element by element (while) 150996286 0:28.399 4,25


f:\MyProgramming\TimSort\Benchmarks\bin\Release>Benchmarks.exe
============ Move part of array right by 1: int ============
Array.Copy()               459040445 0:29.262 1,38
Element by element (for)   645863535 0:30.929 1,04
Element by element (while) 651068500 0:30.064 1,00

============ Move part of array right by 1: string ============
Array.Copy()               403684808 0:30.191 1,62
Element by element (for)   162646202 0:30.051 4,00
Element by element (while) 160947492 0:30.945 4,16

Two runs at smallArraySize = 256:

f:\MyProgramming\TimSort\Benchmarks\bin\Release>Benchmarks.exe
============ Move part of array right by 1: int ============
Array.Copy()               172632756 0:30.128 1,00
Element by element (for)    91403951 0:30.253 1,90
Element by element (while)  65352624 0:29.141 2,56

============ Move part of array right by 1: string ============
Array.Copy()               153426720 0:28.964 1,08
Element by element (for)    19518483 0:30.353 8,91
Element by element (while)  19399180 0:29.793 8,80


f:\MyProgramming\TimSort\Benchmarks\bin\Release>Benchmarks.exe
============ Move part of array right by 1: int ============
Array.Copy()               184710866 0:30.456 1,00
Element by element (for)    92878947 0:29.959 1,96
Element by element (while)  73588500 0:30.331 2,50

============ Move part of array right by 1: string ============
Array.Copy()               157998697 0:30.336 1,16
Element by element (for)    19905046 0:29.995 9,14
Element by element (while)  18838572 0:29.382 9,46
+5  A: 

A few points to note:

  • For BuiltInCopy, you've got one more method call per iteration - your first method calls the other overload which then calls Array.Copy. That's one bit of overhead.
  • Your implementations don't check exactly what they have to do for overlapping copies. Depending on whether they're moving things "up" or "down" (when the target array is the same as the source), they should work from the start or the end, to avoid corruption. Array.Copy will get this right - which is overhead.
  • Array.Copy takes general Array references, which could be multi-dimensional, different types etc. Your methods only ever work on a single array.
  • Array.Copy does a whole bunch of checking for rank, type compatibility etc. Your methods don't.
  • Your methods take fewer arguments, meaning less data needs to be copied on the method call.

I don't know how to account for the difference between reference types and value types, but the above should explain why it's not really a fair comparison between the built in copy and your routines.

Jon Skeet
Yes, these are good points.
Alexey Romanov
However, the difference between `int` and `string` for both approaches is more surprising/interesting to me.
Alexey Romanov
I've edited the question title to reflect this.
Alexey Romanov
If you have time to check, do you get similar differences between reference types and value types?
Alexey Romanov
I suspect it gotta have something to do with how GC tracks references. Copying a value type int is a mere replication of bits while copying a reference to a reference type object is "little" more complicated than that... How little is that "little" - is a question.
Ray
I've found a blog post from CLR codegen team which gives at least a partial answer. Updated the question with the relevant information.
Alexey Romanov
A: 

Just to remove some variables, I tried the same test in VB 2008, without using function calls, and using a StopWatch object instead of Minibench.

array size 32
ticks
1529 - integer, element by element copy
2613 - string, element by element copy
3619 - integer, array.copy
3649 - string, array.copy

However, when I tried an array size of 3,200,000, array.copy was still slower! Maybe array.copy does not use a memcopy equivalent. There are probably some differences in vb vs c++, and the function calls may make be significant in the 32-member test.

array size 3,200,000
ticks
55,750,010 - integer, element by element copy
55,462,881 - string, element by element copy
69,500,804 - integer, array.copy
81,102,288 - string, array.copy

Source:

Dim clock As New Stopwatch
Dim t(4) As Integer
Dim iSize As Integer = 3200000
Dim smallArrayInt(iSize) As Integer
Dim smallArrayString(iSize) As String

For i = LBound(smallArrayInt) To UBound(smallArrayInt)
  smallArrayInt(i) = i
Next i

For i = LBound(smallArrayString) To UBound(smallArrayString)
  smallArrayString(i) = Str(i)
Next i

clock.Reset() : clock.Start()

t(0) = clock.ElapsedTicks
For i = 1 To iSize
  smallArrayInt(i - 1) = smallArrayInt(i)
  Next i
t(1) = clock.ElapsedTicks - t(0)

For i = 1 To iSize
  smallArrayInt(i - 1) = smallArrayInt(i)
  Next i
t(2) = clock.ElapsedTicks - t(1)

Array.Copy(smallArrayInt, 1, smallArrayInt, 0, iSize - 1)
t(3) = clock.ElapsedTicks - t(2)

Array.Copy(smallArrayString, 1, smallArrayString, 0, iSize - 1)
t(4) = clock.ElapsedTicks - t(3)

MsgBox(t(1) & ", " & t(2) & ", " & t(3) & ", " & t(4))
xpda
Am I missing something? Imagine all tests take the same time: 500 ticks, so `clock.ElapsedTicks` is 500 after the first tests, 1000 after the second one, etc. Then you have `t(0) = 500`, `t(1) = 1000 - 500 = 500`, `t(2) = 1500 - 500 = 1000`, `t(3) = 2000 - 1000 = 1000`, `t(4) = 2500 - 1000 = 1500` -- quite incorrect answers!
Alexey Romanov
You are correct. I plead temporary (hopefully) stupidity. Here are the correct answers.n=328 - integer, element by element copy8 - string, element by element copy9 - integer, array.copy7 - string, array.copyn=25615 - integer, element by element copy14 - string, element by element copy9 - integer, array.copy8 - string, array.copyn=3,200,000102,7666 - integer, element by element copy92,110 - string, element by element copy29,406 - integer, array.copy58,859 - string, array.copy(Tick values are smaller -- I'm using a different machine tonight).
xpda
+1  A: 

System.Buffer.BlockCopy is closer to C's memcpy but still has overhead. Your own method will generally be faster for small cases while BlockCopy will be faster for large cases.

Copying references is slower than copying ints because .NET has to do some extra work in most cases when you assign a reference - this extra work is related to garbage collection.

For a demonstration of this fact, look at the code below which includes the native code for copying each string element vs copying each int element (the native code is in comments). Notice that it actually makes a function call to assign the string reference to src[i], while the int is done inline:

    static void TestStrings()
    {
        string[] src = new string[5];
        for (int i = 0; i < src.Length; i++)
            src[i] = i.ToString();
        string[] dst = new string[src.Length];
        // Loop forever so we can break into the debugger when run
        // without debugger.
        while (true)
        {
            for (int i = 0; i < src.Length; i++)
                /*
                 * 0000006f  push        dword ptr [ebx+esi*4+0Ch] 
                 * 00000073  mov         edx,esi 
                 * 00000075  mov         ecx,dword ptr [ebp-14h] 
                 * 00000078  call        6E9EC15C 
                 */
                dst[i] = src[i];
        }
    }
    static void TestInts()
    {
        int[] src = new int[5];
        for (int i = 0; i < src.Length; i++)
            src[i] = i;
        int[] dst = new int[src.Length];
        // Loop forever so we can break into the debugger when run
        // without debugger.
        while (true)
        {
            for (int i = 0; i < src.Length; i++)
                /*
                 * 0000003d  mov         ecx,dword ptr [edi+edx*4+8] 
                 * 00000041  cmp         edx,dword ptr [ebx+4] 
                 * 00000044  jae         00000051 
                 * 00000046  mov         dword ptr [ebx+edx*4+8],ecx 
                 */
                dst[i] = src[i];
        }
    }
Joe Erickson
Didn't know about `System.Buffer` before, thanks!
Alexey Romanov
I would not at all be surprised that the extra overhead for reference type arrays is because of array variance, i.e., you can pass a string[] to an object[] variable/parameter, but assigning an element should still check whether it's a string you're assigning. In case of string[] -> string[] this shouldn't be necessary, but maybe that code path hasn't been optimized.
Ruben
CLR codegen team specifically says this is optimized: see the update.
Alexey Romanov