ansaurus

Question

.NET System.OutOfMemoryException on String.Split() of 120 MB CSV file

Answer 1

+6 A:

You can get an OutOfMemoryException for basically any size of allocation. When you allocate a piece of memory you're really asking for a continuous piece of memory of the requested size. If that cannot be honored you'll see an OutOfMemoryException.

You should also be aware that unless you're running 64 bit Windows, your 4 GB RAM is split into 2 GB kernel space and 2 GB user space, so your .NET application cannot access more that 2 GB per default.

When doing string operations in .NET you risk creating a lot of temporary strings due to the fact that .NET strings are immutable. Therefore you may see memory usage rise quite dramatically.

Brian Rasmussen 2009-04-30 21:30:07

strings are the bastard child of computer science. a necessary evil, but i still wish someone would figure out a better way!

Darren Kopp 2009-04-30 22:17:19

Answer 2

+3 A:

You may not be able to allocate a single object with that much contiguous memory, nor should you expect to be able to. Streaming is the ordinary way to do this, but you're right that it might be slower (although I don't think it should usually be quite that much slower.)

As a compromise, you could try reading a larger portion of the file (but still not the whole thing) at once, with a function like StreamReader.ReadBlock(), and processing each portion in turn.

mquander 2009-04-30 21:30:35

Answer 3

A:

You should probably try the CLR profiler to determine your actual memory usage. It might be that there are memory limits other than your system RAM. For example if this is an IIS application, your memory is limited by the application pools.

With this profile information you might find that you need to use a more scalable technique like the streaming of the CSV file that you originally attempted.

Keltex 2009-04-30 21:31:51

Answer 4

+4 A:

If you have the whole file read into a string you should probably use a StringReader.

StringReader reader = new StringReader(fileContents);
string line;
while ((line = reader.ReadLine()) != null) {
    // Process line
}

This should be roughtly the same as streaming from a file with the difference that the contents are in the memory already.

Edit after testing

Tried the above with a 140MB file where the processing consisted of incrementing length variable with line.Length. This took around 1.6 seconds on my computer. After this I tried the following:

System.IO.StreamReader reader = new StreamReader("D:\\test.txt");
long length = 0;
string line;
while ((line = reader.ReadLine()) != null)
    length += line.Length;

The result was around 1 second.

Of course your mileage may vary, especially if you are reading from a network drive or your processing takes long enough for hard drive to seek somewhere else. But also if you're using FileStream to read the file and you're not buffering. StreamReader provides buffering which greatly enhances the reading.

Mikko Rantanen 2009-04-30 21:33:56

This is a pretty good answer if he can actually read the file into a string in the first place, which it sounds like he can, at least at the moment. I wouldn't be surprised if many machines failed immediately trying to load up a 120MB file (or failed sometimes and worked other times.)

mquander 2009-04-30 21:37:56

Answer 5

+7 A:

Don't roll your own parser unless you have to. I've had luck with this one:

A Fast CSV Reader

If nothing else you can look under the hood and see how someone else does it.

Jay Riggs 2009-04-30 21:34:58

+1 as I have used this to parse large CSV files as well.

Wayne 2009-04-30 21:40:04

+1 from me too. In my experience Sébastien Lorion's CSV reader is efficient, flexible and robust. It should chew through a 120MB file in no time.

LukeH 2009-04-30 21:49:11

Answer 6

A:

You're running out of memory on the stack, not the heap.

You could try re-factoring your app such that you're processing the input in more manageable "chunks" of data rather than processing 120MB at a time.

Garrett 2009-04-30 21:37:16

Strings are allocated on the heap, not the stack. Only the primitives of int/byte/double/etc are ever allocated on the stack imr.

Not Sure 2009-04-30 21:39:48

@not sure: you're correct. however, there are a variety of non-obvious circumstances in which the program stack can fill up. Given that the system in question has ample physical memory, I assume this is probably one of those cases. =)

Garrett 2009-05-01 12:33:19

The stack filling up results in a StackOverflowException, not an OutOfMemoryException; the latter is always used to indicate insufficient memory on the GC Heap.

Not Sure 2009-05-15 19:32:52

Answer 7

+1 A:

As other posters say, the OutOfMemory is because it cannot find a contiguous chunk of memory of the requested size.

However, you say that doing the parsing line by line was several times faster than reading it all in at once and then doing your processing. This only makes sense if you were pursuing the naive approach of doing blocking reads, eg (in pseudo code):

while(! file.eof() )
{
    string line = file.ReadLine();
    ProcessLine(line);
}

You should instead use streaming, where your stream is filled in by Write() calls from an alternate thread which is reading the file, so the file read is not blocked by whatever your ProcessLine() does, and vice-versa. That should be on-par with the performance of reading the entire file at once and then doing your processing.

Not Sure 2009-04-30 21:43:03

Could you give a code example of the multi-threaded approach? I was doing it the naive way, and I now understand why that could be a major problem.

Craig 2009-05-15 19:08:12

.Net has built-in asynchronous file reading and writing, a good starting point is the BeginRead() call. The following Google results have many examples: http://www.google.com/search?q=.net+asynchronous+file

Not Sure 2009-05-15 19:29:40

Answer 8

A:

I agree with most everybody here, you need to use streaming.

I dont know if anybody has said so far, but you should look at an exstention method.

And I know, for sure, hands down, the best CSV splitting technique on .NET / CLR is this one

That technique generated me +10GB XML output's from input CSV, including exstensive input filters and all, faster than anything else I've seen.

RandomNickName42 2009-05-15 08:32:26

Oh Right, also, Streaming > Buffering in your RAM no matter what.Think about it, if you have 4GIG, and you load up 2GIG of input, just the load time and the thrashing of your VM subsystem re-locating pages and the massive size of your page table will just eat up your CPU cache etc... in/out of a small, easy to manage work-space keep's your cache "hot" and all your CPU time is devoted to the task at hand, not the massive fluxuation in system load...

RandomNickName42 2009-05-15 08:35:18

Answer 9

A:

http://csvhelper.com

You can set the size of your reader buffer.

Josh Close 2010-02-22 23:52:26

ansaurus

tags:

views:

answers:

.NET System.OutOfMemoryException on String.Split() of 120 MB CSV file

related questions