views:

84

answers:

2

I have a very large set of binary files where several thousand raw frames of video are being sequentially read and processed, and I’m now looking to optimize it as it appears to be more CPU-bound than I/O-bound.

The frames are currently being read in this manner, and I suspect this is the biggest culprit:

private byte[] frameBuf;  
BinaryReader binRead = new BinaryReader(FS);

// Initialize a new buffer of sizeof(frame)  
frameBuf = new byte[VARIABLE_BUFFER_SIZE];  
//Read sizeof(frame) bytes from the file  
frameBuf = binRead.ReadBytes(VARIABLE_BUFFER_SIZE); 

Would it make much of a difference in .NET to re-organize the I/O to avoid creating all these new byte arrays with each frame?

My understanding of .NET’s memory allocation mechanism is weak as I am coming from a pure C/C++ background. My idea is to re-write this to share a static buffer class that contains a very large shared buffer with an integer keeping track of the frame’s actual size, but I love the simplicity and readability of the current implementation and would rather keep it if the CLR already handles this in some way I am not aware of.

Any input would be much appreciated.

+7  A: 

You don't need to init frameBuf if you use binRead.ReadBytes -- you'll get back a new byte array which will overwrite the one you just created. This does create a new array for each read, though.

If you want to avoid creating a bunch of byte arrays, you could use binRead.Read, which will put the bytes into an array you supply to it. If other threads are using the array, though, they'll see the contents of it change right in front of them. Be sure you can guarantee you're done with the buffer before reusing it.

cHao
Thanks for pointing this out -- I'm sure my redundant allocation is slowing this down appreciably. And the static shared array is exactly what I'm considering, but if the performance gain isn't large compared to creating the byte arrays, I'd much rather stick with the elegant solution for the very same complication you outline (shared access).
rnd
+1  A: 

You need to be careful here. It is very easy to get completely bogus test results on code like this, results that never repro in real use. The problem is the file system cache, it will cache the data you read from a file. The trouble starts when you run your test over and over again, tweaking the code and looking for improvements.

The second, and subsequent times you run the test, the data no longer comes off the disk. It is still present in the cache, it only takes a memory-to-memory copy to get it into your program. That's very fast, a microsecond or so of overhead plus the time needed to copy. Which runs at bus-speeds, at least 5 gigabytes per second on modern machines.

Your test will now reveal that you spend a lot of time on allocating the buffer and processing the data, relative from the amount of time spent reading the data.

This will rarely repro in real use. The data won't be in the cache yet, now the sluggy disk drive needs to seek the data (many milliseconds) and it needs to be read off the disk platter (a couple of dozen megabytes per second, at best). Reading the data now takes a good three of four magnitudes of time longer. If you managed to make the processing step twice as fast, your program will actually only run 0.05% faster. Give or take.

Hans Passant
That's a good point, however I'm running my tests on a dataset that dwarfs my machine's memory by several gigabytes. What concerns me is the fact that similar code in my old C++ library would process this dataset in less than half the time this takes. However, I did notice the profile is warning that approx 2,826 pages per /s are being written to disk and that the app may be memory-bound. I am not explicit disposing of any of these arrays - could these be getting cached before the GC de-allocates them?
rnd
These buffers are probably large, more than 85KB. Which gets them allocated in the LOH. They'll be stuck there for a while, it takes a gen#2 collection. Nothing comes for free, reusing buffers when they are large is a good strategy in .NET as well.
Hans Passant