ansaurus

Question

How to access a subset of XML data in Java when the XML data is too large to fit in memory?

Answer 1

+1 A:

The only way to parse the part of the document without fully loading it to the memory is using the SAX parser.

Here are some official SUN examples of how to use SAX: http://java.sun.com/developer/codesamples/xml.html#sax

Juriy 2010-05-10 11:29:03

I don't know exactly which nodes I'll want to find until runtime, and they could have any structure in terms of child nodes, attributes, CDATA, etc. I'd have to build the DOM document myself.I wondered whether something might exist to save me having to do this. I saw StAX, but it doesn't quite seem to offer this.

Michael Jones 2010-05-10 11:31:36

@Michael Jones - SAX is your beast.

Romain Hippeau 2010-05-10 11:36:46

Is there a way in SAX to access a complete element when the end element event is called? For instance, if I know I'm looking for the next "book" node, if when SAX fired the end element event I could just write "return xml.getElement()" and it returned the whole node, that'd be great. As far as I can tell I can only access properties of an individual element (e.g. attributes, CDATA, etc) instead of the whole node and its children.

Michael Jones 2010-05-10 11:42:33

Answer 2

A:

You could do the following:

Scan the XML file using SAX or StAX and immediately serizalize everything back into a StringBuilder, i.e. create your own copy of the XML file.
If you encounter a endElement and you know you don't need the subtree you just parsed, clear the StringBuilder.
If you need it, you can build a DOM tree from the "copy" you created.

With this you can fall back to standard frameworks, one for conventional SAX parsing and one for conventional DOM building. Only the custom serizalization might require some hacking.

Also it helps if you need to know the tree boundaries in advance. (book elements in your example) Otherwise further processing would be required.

DR 2010-05-10 12:26:33

I think this is my best option. I'll give it a go. Thanks.

Michael Jones 2010-05-10 15:12:49

ansaurus

tags:

views:

answers:

How to access a subset of XML data in Java when the XML data is too large to fit in memory?

related questions