views:

371

answers:

3

Hi,

I am using expat parser to parse an XML file of around 15 GB . The problem is it throws an "Out of Memory" error and the program aborts .

I want to know has any body faced a similar issue with the expat parser or is it a known bug and has been rectified in later versions ?

+1  A: 

I don't know expat at all, but I'd guess that it's having to hold too much state in memory for some reason. Is the XML mal formed in some way? Do you have handlers registered for end tags of large blocks?

I'm thinking that if you have a handler registered for the end of a large block, and expat is expected to pass the block to the handler, then expat could be running out of memory before it's able to completely gather that block. As I said, I don't know expat, so this might not be possible, I'm just asking.

Alternately, are you sure that expat is where the memory loss is? I could imagine a situation where you were keeping some information about the contents of the XML file, and your own data structures, either because the data was so large, or because of memory leaks in your code, caused the out of memory condition.

Michael Kohne
Actually I am running out of Virtual Memory , its exceeding 2 GB . Does the memory allocated on the heap only responsible for Virtual Memory size increase ?
sameer karjatkar
sameer - It could be running out of stack space if you have VERY deeply nested elements, but it's almost certainly a heap space problem.
Eric Petroelje
Everything in your process (heap and stack and program, etc) are part of your process's virtual memory footprint. You might want to try parsing the file with no handlers registered (or just one on the start of something) and see if expat fails or not.
Michael Kohne
+2  A: 

I've used expat to parse large files before and never had any problems. I'm assuming you're using SAX and not one of the expat DOM wrappers. If you are using DOM, then that's your problem right there - it would be essentially trying to load the whole file into memory.

Are you allocating objects as you parse the XML and maybe not deallocating them? That would be the first thing I would check for. One way to check if the problem is really with expat or not - if you reduce the program to a simple version that has empty tag handlers (i.e. it just parses the file and does nothing with the results) does it still run out of memory?

Eric Petroelje
I am quite new to parsing . Do we have different API's for SAX parser. From the API's used I am not able to confirm whether its using a SAX or a DOM parser
sameer karjatkar
With a SAX parser, you'll be writing methods like "begin_element", "end_endelement", "characters", etc. and managing state yourself as you parse. With a DOM parser it will parse the whole document at once and you will be able to "browse" through the document tree in your code.
Eric Petroelje
+1  A: 

Expat is an event-driven parser which does not construct large in-memory structures. So it's probably not expat (which is very widely used for parsing large files) that is the problem - much more likely it is your own code.

anon