views:

114

answers:

2

Hi,be m I am working on an application which has below requiements - 1. Download a ZIP file from a server. 2. Uncompress the ZIP file, get the content (which is in XML format) from this file into a String. 3. Pass this content into another method for parsing and further processing.

Now, my concerns here is the XML file may be of Huge size say like '100MB', and my JVM has memory of only 512 MB, so how can I get this content into Chunks and pass for Parsing and then insert the data into PL/SQL tables.

Since there can be multiple requests running at the same time and considering 512MB of memory what will be the best possible to process this. How I can get the data into Chunks and pass it as Stream for XML parsing.

I googled on this, but didnt find any implementation. :(

Thanks,

+2  A: 

Any SAX parser should work since it won't load the entire XML file into memory like a DOM parser.

Taylor Leese
+3  A: 

Java's XMLReader is a a SAX2 parser. Where a DOM parser reads the whole of the XML file in and creates a (often large) data structure (usually a tree) to represent its contents, a SAX parser lets you register a handler that will be called when pieces of the XML document are recognized. In that call-back code, you can save only enough data to do what you need -- e.g. you might save all the fields that will end up as a single row in the database, insert that row and then discard the data. With this type of design, your program's memory consumption depends less on the file size than on the complexity and size of a single logical data item (in your case, the data that will become one row in the database).

Even if you did use a DOM-style parser, things might not be quite as bad as you expect. XML is pretty verbose, so (depending on how it's structured and such) a 100 MB file will often represent only 10-20 MB of data, and as little as 5 MB of data wouldn't be particularly rare or unbelievable.

Jerry Coffin
Thanks for your reply.But my primary requirement is that I dont want to load all of data into Memory, instead I have to read a small part of XML file in Chunks and pass it to some other application for processing to avoid Out Of Memory errors....
Manish Dhanotiya
@Manish: that's why I (and the other people who gave you replies) recommended a SAX-style parser -- it never attempts to load the whole file into memory.
Jerry Coffin
Hi Jerry,Actually in my requirement I have to download the file from a server and get the XML data into a String, this String I have to pass to some other application which is parsing the XML.Can I have some mechanism, where during downloading the file from a server, I can pass these chunks of data to some other application ?
Manish Dhanotiya
you should look at vtd-xml (of which I am the author)
vtd-xml-author