views:

321

answers:

3

After testing my app with Instruments I realized that the current CSV parser I use has a huge memory footprint. Does anybody have a recommendation for one with a low memory footprint?

+1  A: 

There are some other CSV parsers to try:

You could experiment to see if either is lower memory overhead.

Neither of these supports "event based" parsing. In event based parsing, you never load the whole source file into memory, just enough of the file to read the current row (you can also do this in-progress on a download). You must handle each row as it is read and make certain all data from the source is freed between rows.

This would be the theoretical lowest overhead solution. If you really needed low overhead, you should adapt an existing solution to do that (I don't have any advice on how this would be done).

Matt Gallagher
Thx Matt. I decided to go with Marks solution for the moment because its requires less testing then switching parsers.But switching to an event based parser is now on my roadmap. Libcsv (http://sourceforge.net/projects/libcsv/) seems like to one.
catlan
+2  A: 

You probably should do this row-by-row, rather than reading the whole file, parsing it, and returning an array with all the rows in it. In any case, the code you linked to produces zillions of temporary objects in a loop, which means it'll have very high memory overhead.

A quick fix would be to create an NSAutoreleasePool at the lop of the loop, and drain it at the bottom:

while ( ![scanner isAtEnd] ) {        
    NSAutoreleasePool *innerPool = [[NSAutoreleasePool alloc] init];

... bunch of code...

    [innerPool drain];
}

This will wipe out the temporary objects, so your memory usage will be the size of the data, plus an object for each string in the file (roughly 8 bytes * rows * columns)

Mark Bessey
A: 

It's not a CSV parser, but my open source Cocoa ParseKit framework has a powerfull/convenient/configurable string tokenizer which might be handy for CSV or other types of parsing/tokenizing.

The framework:

http://parsekit.com

Some usage documentation:

http://parsekit.com/tokenization.html

The PKTokenizer class:

http://github.com/itod/parsekit/blob/master/include/ParseKit/PKTokenizer.h http://github.com/itod/parsekit/blob/master/src/PKTokenizer.m

Todd Ditchendorf