tags:

views:

520

answers:

6

(btw. This refers to 32 bit OS)

SOME UPDATES:

  • This is definitely an alignment issue

  • Sometimes the alignment (for whatever reason?) is so bad that access to the double is more than 50x slower than its fastest access.

  • Running the code on a 64 bit machine cuts down the issue, but I think it was still alternating between two timing (of which I could get similar results by changing the double to a float on a 32 bit machine)

  • Running the code under mono exhibits no issue -- Microsoft, any chance you can copy something from those Novell guys???


Is there a way to memory align the allocation of classes in c#?

The following demonstrates (I think!) the badness of not having doubles aligned correctly. It does some simple math on a double stored in a class, timing each run, running 5 timed runs on the variable before allocating a new one and doing it over again.

Basically the results looks like you either have a fast, medium or slow memory position (on my ancient processor, these end up around 40, 80 or 120ms per run)

I have tried playing with StructLayoutAttribute, but have had no joy - maybe something else is going on?

class Sample
{
    class Variable { public double Value; }

    static void Main()
    {
        const int COUNT = 10000000;
        while (true)
        {
            var x = new Variable();
            for (int inner = 0; inner < 5; ++inner)
            {
                // move allocation here to allocate more often so more probably to get 50x slowdown problem
                var stopwatch = Stopwatch.StartNew();

                var total = 0.0;
                for (int i = 1; i <= COUNT; ++i)
                {
                    x.Value = i;
                    total += x.Value;
                }
                if (Math.Abs(total - 50000005000000.0) > 1)
                    throw new ApplicationException(total.ToString());

                Console.Write("{0}, ", stopwatch.ElapsedMilliseconds);
            }
            Console.WriteLine();
        }
    }
}

So I see lots of web pages about alignment of structs for interop, so what about alignment of classes?

(Or are my assumptions wrong, and there is another issue with the above?)

Thanks, Paul.

+2  A: 

Maybe the StructLayoutAttribute is what you are looking for?

Konamiman
Ahh, I should read the docs! It does say that is also for layout of classes! (silly attribute name!) I'll give it a burl...
Paul Westcott
Playing with these fields seems to have no affect on the result? Maybe the issue is caused by some other reason?
Paul Westcott
A: 

Using struct instead of class, makes the time constant. also consider using StructLayoutAttribute. It helps to specify exact memory layout of a structures. For CLASSES I do not think you have any guarantees how they are layouted in memory.

ironic
I was after allocations for classes (which was said in the post). I have these classes embedded in lambda functions, where I want to be able to change the values, and I can't do that with a struct.
Paul Westcott
+2  A: 

Interesting look in the gears that run the machine. I have a bit of a problem explaining why there are multiple distinct values (I got 4) when a double can be aligned only two ways. I think alignment to the CPU cache line plays a role as well, although that only adds up to 3 possible timings.

Well, nothing you can do about it, the CLR only promises alignment for 4 byte values so that atomic updates on 32-bit machines are guaranteed. This is not just an issue with managed code, C/C++ has this problem too. Looks like the chip makers need to solve this one.

If it is critical then you could allocate unmanaged memory with Marshal.AllocCoTaskMem() and use an unsafe pointer that you can align just right. Same kind of thing you'd have to do if you allocate memory for code that uses SIMD instructions, they require a 16 byte alignment. Consider it a desperation-move though.

Hans Passant
Sounds like some interesting plans! I might give the desperation move a go tomorrow ;)
Paul Westcott
A: 

You don't have any control over how .NET lays out your class in memory.

As others have said the StructLayoutAttribute can be used to force a specific memory layout for a struct BUT note that the purpose of this is for C/C++ interop, not for trying to fine-tune the performance of your .NET app.

If you're worried about memory alignment issues then C# is probably the wrong choice of language.


EDIT - Broke out WinDbg and looked at the heap running the code above on 32-bit Vista and .NET 2.0.

Note: I don't get the variation in timings shown above.

0:003> !dumpheap -type Sample+Variable
 Address       MT     Size
01dc2fec 003f3c48       16     
01dc54a4 003f3c48       16     
01dc58b0 003f3c48       16     
01dc5cbc 003f3c48       16     
01dc60c8 003f3c48       16     
01dc64d4 003f3c48       16     
01dc68e0 003f3c48       16     
01dc6cd8 003f3c48       16     
01dc70e4 003f3c48       16     
01dc74f0 003f3c48       16     
01dc78e4 003f3c48       16     
01dc7cf0 003f3c48       16     
01dc80fc 003f3c48       16     
01dc8508 003f3c48       16     
01dc8914 003f3c48       16     
01dc8d20 003f3c48       16     
01dc912c 003f3c48       16     
01dc9538 003f3c48       16     
total 18 objects
Statistics:
      MT    Count    TotalSize Class Name
003f3c48       18          288 TestConsoleApplication.Sample+Variable
Total 18 objects
0:003> !do 01dc9538 
Name: TestConsoleApplication.Sample+Variable
MethodTable: 003f3c48
EEClass: 003f15d0
Size: 16(0x10) bytes
 (D:\testcode\TestConsoleApplication\bin\Debug\TestConsoleApplication.exe)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
6f5746e4  4000001        4        System.Double  1 instance 1655149.000000 Value

This seems to me that the classes' allocation addresses appear to be aligned unless I'm reading this wrong?

Paolo
I see that as quite a sad comment. I agree that if you want super duper performance we'd all be off writting CUDA or whatever, but to accept an obvious sub standard implementation of how floating point numbers are handled in a normal fashion if just stupid.
Paul Westcott
Interesting. I have been playing on 32 bit XP compiled w/ .net 3,5 (across a number of different machines, from my old home dell D800 laptop, to whatever my current relatively new desktop machine is at work) and have seen the issue on all of them. I'll have to give it a burl on vista??
Paul Westcott
+1 If you're worried about memory alignment issues then C# is probably the wrong choice of language. Use the right tool for the job. C++ or ASM.
Andrew
See first comment on this msg - (re "sad" and "stupid")
Paul Westcott
A: 

It will be correctly aligned, otherwise you'd get alignment exceptions on x64. I don't know what your snippet shows, but I wouldn't say anything about alignment from it.

erikkallen
further investigation has proven that it is COMPLETELY an alignment issue.
Paul Westcott
I don't believe this, unless we have a different definition of alignment. My definition of alignment is that a primitive must reside on an address that is a multiple of its size (e.g, the address of a 32-bit number must be divisible by 4, etc). I can not believe that objects are not always correctly aligned, and if you set certain processor flags on the x86, and always on the x64, you will get exceptions if you use unaligned values. Then there is the issue of caching, but that's a separate issue.
erikkallen
A WAY more likely cause for your issue is that sometimes your thread gets pre-empted.
erikkallen
The caching/thread I ruled out immediately, as that was the whole purpose of running the timing 5 times with the same allocated object. Have you looked at the AlignNew code that was posted? It checks to see on which byte boundary the alignment happens on mod 8 (i.e. per 64 bits) and depending where it falls we get the performance characteristics described.
Paul Westcott
+2  A: 

To prove the concept of misalignment of objects on heap in .NET you can run following code and you'll see that now it always runs fast. Please don't shoot me, it's just a PoC, but if you are really concerned about performance you might consider using it ;)

public static class AlignedNew
{
    public static T New<T>() where T : new()
    {
        LinkedList<T> candidates = new LinkedList<T>();
        IntPtr pointer = IntPtr.Zero;
        bool continue_ = true;

        int size = Marshal.SizeOf(typeof(T)) % 8;

        while( continue_ )
        {
            if (size == 0)
            {
                object gap = new object();
            }

            candidates.AddLast(new T());

            GCHandle handle = GCHandle.Alloc(candidates.Last.Value, GCHandleType.Pinned);
            pointer = handle.AddrOfPinnedObject();
            continue_ = (pointer.ToInt64() % 8) != 0 || (pointer.ToInt64() % 64) == 24;

            handle.Free();

            if (!continue_)
                return candidates.Last.Value;
        }

        return default(T);
    }
}

class Program
{

    [StructLayoutAttribute(LayoutKind.Sequential)]
    public class Variable
    {
        public double Value;
    }

    static void Main()
    {

        const int COUNT = 10000000;

        while (true)
        {

            var x = AlignedNew.New<Variable>();


            for (int inner = 0; inner < 5; ++inner)
            {

                var stopwatch = Stopwatch.StartNew();

                var total = 0.0;
                for (int i = 1; i <= COUNT; ++i)
                {
                    x.Value = i;
                    total += x.Value;
                }
                if (Math.Abs(total - 50000005000000.0) > 1)
                    throw new ApplicationException(total.ToString());


                Console.Write("{0}, ", stopwatch.ElapsedMilliseconds);
            }
            Console.WriteLine();
        }

    }
}
Gregor Pacnik
Evil.Pure evil.
Paul Westcott
:) You are causing 100x overhead while creating new object that way :) Not so evil.
Gregor Pacnik
Certainly evil but quite inventive. This only partially works however. If the GC decides to move your object later then you may see a degradation in performance. There's no way to prevent this. Also for large T objects or arrays of objects this could cause serious memory allocation issues.
Ade Miller