tags:

views:

145

answers:

2

I am looking for the most efficient/direct way to do this simple C/C++ operation:

void ReadData(FILE *f, uint16 *buf, int startsamp, int nsamps)
{
   fseek(f, startsamp*sizeof(uint16), SEEK_SET);
   fread(buf, sizeof(uint16), nsamps, f);
}

in C#/.NET. (I'm ignoring return values for clarity - production code would check them.) Specifically, I need to read in many (potentially 10's to 100's of millions) 2-byte (16-bit) "ushort" integer data samples (fixed format, no parsing required) stored in binary in a disk file. The nice thing about the C way is that it reads the samples directly into the "uint16 *" buffer with no CPU involvement, and no copying. Yes, it is potentially "unsafe", as it uses void * pointers to buffers of unknown size, but it seems like there should be a "safe" .NET alternative.

What is the best way to accomplish this in C#? I have looked around, and come across a few hints ("unions" using FieldOffset, "unsafe" code using pointers, Marshalling), but none seem to quite work for this situation, w/out using some sort of copying/conversion. I'd like to avoid BinaryReader.ReadUInt16(), since that is very slow and CPU intensive. On my machine there is about a 25x difference in speed between a for() loop with ReadUInt16(), and reading the bytes directly into a byte[] array with a single Read(). That ratio could be even higher with non-blocking I/O (overlapping "useful" processing while waiting for the disk I/O).

Ideally, I would want to simply "disguise" a ushort[] array as a byte[] array so I could fill it directly with Read(), or somehow have Read() fill the ushort[] array directly:

// DOES NOT WORK!!
public void GetData(FileStream f, ushort [] buf, int startsamp, int nsamps)
{
    f.Position = startsamp*sizeof(ushort);
    f.Read(buf, 0, nsamps);
}

But there is no Read() method that takes a ushort[] array, only a byte[] array.

Can this be done directly in C#, or do I need to use unmanaged code, or a third-party library, or must I resort to CPU-intensive sample-by-sample conversion? Although "safe" is preferred, I am fine with using "unsafe" code, or some trick with Marshal, I just have not figured it out yet.

Thanks for any guidance!


[UPDATE]

I wanted to add some code as suggested by dtb, as there seem to be precious few examples of ReadArray around. This is a very simple one, w/no error checking shown.

public void ReadMap(string fname, short [] data, int startsamp, int nsamps)
{
    var mmf = MemoryMappedFile.CreateFromFile(fname);
    var mmacc = mmf.CreateViewAccessor();

    mmacc.ReadArray(startsamp*sizeof(short), data, 0, nsamps);
}

Data is safely dumped into your passed array. You can also specify a type for more complex types. It seems able to infer simple types on its own, but with the type specifier, it would look like this:

    mmacc.ReadArray<short>(startsamp*sizeof(short), data, 0, nsamps);

[UPATE2]

I wanted to add the code as suggested by Ben's winning answer, in "bare bones" form, similar to above, for comparison. This code was compiled and tested, and works, and is FAST. I used the SafeFileHandle type directly in the DllImport (instead of the more usual IntPtr) to simplify things.

[DllImport("kernel32.dll", SetLastError=true)]
[return:MarshalAs(UnmanagedType.Bool)]
static extern bool ReadFile(SafeFileHandle handle, IntPtr buffer, uint numBytesToRead, out uint numBytesRead, IntPtr overlapped);

[DllImport("kernel32.dll", SetLastError=true)]
[return:MarshalAs(UnmanagedType.Bool)]
static extern bool SetFilePointerEx(SafeFileHandle hFile, long liDistanceToMove, out long lpNewFilePointer, uint dwMoveMethod);

unsafe void ReadPINV(FileStream f, short[] buffer, int startsamp, int nsamps)
{
    long unused; uint BytesRead;
    SafeFileHandle nativeHandle = f.SafeFileHandle; // clears Position property
    SetFilePointerEx(nativeHandle, startsamp*sizeof(short), out unused, 0);

    fixed(short* pFirst = &buffer[0])
        ReadFile(nativeHandle, (IntPtr)pFirst, (uint)nsamps*sizeof(short), out BytesRead, IntPtr.Zero);
}
+7  A: 

You can use a MemoryMappedFile. After you have memory-mapped the file, you can create a view (i.e. a MemoryMappedViewAccessor) which provides a ReadArray<T> method. This method can read structs from the file without marshalling, and it works with primitive types lie ushort.

dtb
This is a great approach if you have .NET 4. It has even less copying than the C code dale wanted to emulate. In older versions of .NET you'd probably have to p/invoke `ReadFile` to emulate the C code, or p/invoke `CreateFileMapping` for this faster way.
Ben Voigt
dtb, thanks, I had not seen ReadArray(), and indeed even Google is not very aware of it yet! It looks like a VERY handy tool. I did some timing, and it is about twice as fast as a for() loop with ReadUInt16(), so I suspect it is doing some copying under the hood (reading bytes w/out conversion is still about 10x faster). I see that the Accessor class has many similar methods to BinaryReader. I wonder if MS could eventually add a ReadArray() method to BinaryReader, then we could just read structures directly from a stream w/out having to go through the memory mapping.
dale
You're right of course, since .NET metadata is stored in the same memory block as the content, it has no choice but to copy. If you P/Invoke `CreateFile` and `ReadFile` passing a pointer to the first element of your `ushort[]` (requires unsafe code) you should get the same speed as reading a `byte[]`.
Ben Voigt
Also, you don't need an `unsafe` code block per se, you can use http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.marshal.unsafeaddrofpinnedarrayelement(v=VS.100).aspx
Ben Voigt
Oh hey, you don't need to p/invoke `CreateFile`, you can pass the `FileStream`'s `SafeFileHandle` property directly to `SetFilePosition` and `ReadFile`.
Ben Voigt
Ben, I will have a look at that. Don't you want to put this suggestion as an answer?? It could be a winner.
dale
ok I went ahead and elaborated on that in my answer with sample code.
Ben Voigt
I agree with Ben that, in general, dtb's method is "better", in that it is more .NET-ish, does not use DLLImport, or unsafe code, is straightforward, and is more general. I will definitely keep it in my toolbox.However, since I'd asked for a way to do this specific task (direct reading of int16's from a file into a C# array) as fast as possible, w/out any CPU involvement in copying/converting each sample, I gave the winning vote to Ben's answer, which best met the original constraints. Thanks to all for answering!
dale
+2  A: 

dtb's answer is an even better way (actually, it has to copy the data as well, no gain there), but I just wanted to point out that to extract ushort values from a byte array you should be using BitConverter not BinaryReader

EDIT: example code for p/invoking ReadFile:

[DllImport("kernel32.dll", SetLastError=true)]
[return:MarshalAs(UnmanagedType.Bool)]
static extern bool ReadFile(IntPtr handle, IntPtr buffer, uint numBytesToRead, out uint numBytesRead, IntPtr overlapped);

[DllImport("kernel32.dll", SetLastError=true)]
[return:MarshalAs(UnmanagedType.Bool)]
static extern bool SetFilePointerEx(IntPtr hFile, long liDistanceToMove, out long lpNewFilePointer, uint dwMoveMethod);

unsafe bool read(FileStream fs, ushort[] buffer, int offset, int count)
{
  if (null == fs) throw new ArgumentNullException();
  if (null == buffer) throw new ArgumentNullException();
  if (offset < 0 || count < 0 || offset + count > buffer.Length) throw new ArgumentException();
  uint bytesToRead = 2 * count;
  if (bytesToRead < count) throw new ArgumentException(); // detect integer overflow
  long offset = fs.Position;
  SafeFileHandle nativeHandle = fs.SafeFileHandle; // clears Position property
  try {
    long unused;
    if (!SetFilePositionEx(nativeHandle, offset, out unused, 0);
    fixed (ushort* pFirst = &buffer[offset])
      if (!ReadFile(nativeHandle, new IntPtr(pFirst), bytesToRead, out bytesToRead, IntPtr.Zero)
        return false;
    if (bytesToRead < 2 * count)
      return false;
    offset += bytesToRead;
    return true;
  }
  finally {
    fs.Position = offset; // restore Position property
  }
}
Ben Voigt
Ben, thanks, I had a look at BitConverter(), but I'm not sure I understand your suggestion. BinaryReader() is for reading from files (which I'm doing), and BitConverter() is for converting existing byte[] arrays to other types. Isn't BinaryReader().ReadUInt16() equivalent to reading in the bytes to an array and calling BitConverter().ToUInt16()? Maybe I'm misunderstanding...
dale
But `ReadUInt16` only reads one element at a time... which is a lousy way to do I/O.
Ben Voigt
No it isn't. BinaryReader is responsible for converting the bytes of an underlying stream to the requested type, not reading the bytes from IO in the first place.
Panagiotis Kanavos
Ah, thank you Ben. I was missing that BitConverter() can operate on arrays. Thanks.
dale
[Hmm, I couldn't edit my last comment] ...but wait, looking closer at ToUInt16(), I see it takes an array argument and an index, but still only converts one item at a time. Strange. In other words, you still have to loop when using ToUInt16(), unless I am again missing something, which is entirely possible.
dale
Right. It only operates on one element at a time (when inlining takes place this isn't a real problem) but can pull the data out of anywhere in the byte array, you don't have to prepare separate two-byte arrays.
Ben Voigt
Ben, this works great. I did the timing, and it is essentially identical to reading the bytes directly, which is exactly what I was looking for. I suppose you could also use your Marshal.UnsafeAddrOfPinnedArrayElement suggestion above, which essentially accomplishes the same thing. Thanks for taking the time to write out some code, too!
dale
BTW: An array version of BitConverter (doesn't need unsafe code or pointers, but still does a copy): http://msdn.microsoft.com/en-us/library/system.buffer.blockcopy.aspx
Ben Voigt