views:

145

answers:

3

I've got a TCP server written in C# that processes POST data sent to it. Currently it works fine unless a large amount of data (i.e. greater than 1GB) is sent to it then it runs out of memory (I store it all in memory as an array of bytes (with a intermediary of a List DTO)). For large files now I stream down to disk and then pass the filename around with the intention of streaming it from disk.

Currently all of my routines are written to expect byte arrays which, in hindsight, was a little short-sighted. If I just convert the bytearray to a memorystream will it double the memory usage? I think re-writing my code to work on a memorystream will allow me to re-use it when I'm reading a stream from disk?

Sorry for the stupid questions, I'm never sure when c# takes a copy of the data or when it takes a reference.

A: 

A MemoryStream is just a stream wrapper around a byte array, so you won't be gaining anything by using it.

What you need to do (for large files at least) is open a FileStream and dump your data in that. At a lower level you have to read X bytes from your connection and then write that immediately to your file stream. This way you won't be pulling in a full gig into memory but only a few bytes at a time.

Whether or not this will be easy to do depends on how your TCP server is coded.

Will
+1  A: 

Since a byte is a value type, if you pass it to a function without the ref keyword you'll be dealing with a copy each time. If you pass it with the ref keyword, it'll take a reference to the original byte array.

A memorystream is a reference type, so it will not copy the data, but you're passing around a reference to that data, so your memory usage will not double upon using that.

Tony
+1  A: 

If you pass a byte[] into a MemoryStream, then it will copy the data initially (in the constructor), but as long as you release the byte[] it can be garbage collected. Inherently there is no "doubling" (especially if you can set the size correctly to start with, and write directly to the Stream rather than the byte[]).

I would entirely say switch to Stream (but only use Stream in the API - nothing moer specific; your consuming code doesn't need to know which type). Most importantly, you can choose to use the NetworkStream (to read directly from the socket) or FileStream (if you want to buffer to disk), or MemoryStream if you want to buffer in-process. You will also need to make sure you read that volume of data via stream-based code. Iterator blocks (yield return) can be very helpful here, as can the LINQ Enumerable methods (except for OrderBy, GroupBy, etc, which buffer).

Neither passing a byte[] nor passing a Stream causes anything to get copied, as they are reference-types - the only thing copied is the reference (4 or 8 bytes, depending on x86/x64).

Marc Gravell