views:

347

answers:

3

In C#, I need to write T[] to a stream, ideally without any additional buffers. I have a dynamic code that converts T[] (where T is a no-objects struct) to a void* and fixes it in memory, and that works great. When the stream was a file, I could use native Windows API to pass the void * directly, but now I need to write to a generic Stream object that takes byte[].

Question: Can anyone suggest a hack way to create a dummy array object which does not actually have any heap allocations, but rather points to an already existing (and fixed) heap location?

This is the pseudo-code that I need:

void Write(Stream stream, T[] buffer)
{
    fixed( void* ptr = &buffer )    // done with dynamic code generation
    {
        int typeSize = sizeof(T);   // done as well

        byte[] dummy = (byte[]) ptr;   // <-- how do I create this fake array?

        stream.Write( dummy, 0, buffer.Length*typeSize );
    }
}  

Update: I described how to do fixed(void* ptr=&buffer) in depth in this article. I could always create a byte[], fix it in memory and do an unsafe byte-copying from one pointer to another, and than send that array to the stream, but i was hoping to avoid unneeded extra allocation and copying.

Impossible? Upon further thinking, the byte[] has some meta data in heap with the array dimensions and the element type. Simply passing a reference (pointer) to T[] as byte[] might not work because the meta data of the block would still be that of T[]. And even if the structure of the meta data is identical, the length of the T[] will be much less than the byte[], hence any subsequent access to byte[] by managed code will generate incorrect results.

A: 

Because stream.Write cannot take a pointer, you cannot avoid copying memory, so you will have some slowdown. You might want to consider using a BinaryReader and BinaryWriter to serialize your objects, but here is code that will let you do what you want. Keep in mind that all members of T must also be structs.

unsafe static void Write<T>(Stream stream, T[] buffer) where T : struct
{
    System.Runtime.InteropServices.GCHandle handle = System.Runtime.InteropServices.GCHandle.Alloc(buffer, System.Runtime.InteropServices.GCHandleType.Pinned);
    IntPtr address = handle.AddrOfPinnedObject();
    int byteCount = System.Runtime.InteropServices.Marshal.SizeOf(typeof(T)) * buffer.Length;
    byte* ptr = (byte*)address.ToPointer();
    byte* endPtr = ptr + byteCount;
    while (ptr != endPtr)
    {
        stream.WriteByte(*ptr++);
    }
    handle.Free();
}
John JJ Curtis
Jeff, Marshal.Copy is unbelievably slow, and can easily be avoided. I described it in depth in the article at http://www.codeproject.com/KB/cs/ReadingStructuresEmit.aspx I could always create a byte[], and use my approach for fast byte copying from one fixed ptr to another, and then send that array to the stream, but i was hoping to avoid unneeded allocation and copying.
Yurik
Edited, although I'm not sure how writing one byte at a time will perform
John JJ Curtis
For .NET 4.0 I would go with nobugz answer, otherwise I think this might be as good as you can get pre .NET 4.0 ...
John JJ Curtis
+1  A: 

This kind of code can never work in a generic way. It relies on a hard assumption that the memory layout for T is predictable and consistent. That is only true if T is a simple value type. Ignoring endianness for a moment. You are dead in the water if T is a reference type, you'll be copying tracking handles that can never be deserialized, you'll have to give T the struct constraint.

But that's not enough, structure types are not copyable either. Not even if they have no reference type fields, something you can't constrain. The internal layout is determined by the JIT compiler. It swaps fields at its leisure, selecting one where the fields are properly aligned and the structure value take the minimum storage size. The value you'll serialize can only be read properly by a program that runs with the exact same CPU architecture and JIT compiler version.

There are already plenty of classes in the framework that do what you are doing. The closest match is the .NET 4.0 MemoryMappedViewAccessor class. It needs to do the same job, making raw bytes available in the memory mapped file. The workhorse there is the System.Runtime.InteropServices.SafeBuffer class, have a look-see with Reflector. Unfortunately, you can't just copy the class, it relies on the CLR to make the transformation. Then again, it is only another week before it's available.

Hans Passant
Nobugz, thanks for all the 4.0 info - I'll look at it in depth. As for the T - my code checks for struct only, no ref types, with either explicit or sequential pack 1 structs. That should take care of all issues you have raised. Take a look at http://code.google.com/p/timeseriesdb/ for the working implementation.
Yurik
IF this already works then what is the point of the question? Spam?
Hans Passant
No, of course not :). I can only fix an array of T in heap and get a void*, which is great when working with mem mapped files or other win API. It does not work with any methods that take byte[].
Yurik
A: 

Check out my answer to a related question: http://stackoverflow.com/questions/619041/what-is-the-fastest-way-to-convert-a-float-to-a-byte/3577253#3577253

In it I temporarily transform an array of floats to an array of bytes without memory allocation and copying. To do this I changed the CLR's metadata using memory manipulation.

Unfortunately, this solution does not lend itself well to generics. However, you can combine this hack with code generation techniques to solve your problem.

Omer Mor
Omer: Your technique won't work here because it requires the metadata to be in place. If you've got a `void*` pointing to a memory-mapped file, there's no way to get the metadata there without modifying the contents of the file.
Gabe