tags:

views:

94

answers:

2

What I would really like is a streaming API that works sort of like StAX, and sort of like DOM/JDom.

It would be streaming in the sense that it would be very lazy and not read things in until needed. It would also be streaming in the sense that it would read everything forwards (but not backwards).

Here's what code that used such an API would look like.

URL url = ...
XMLStream xml = XXXFactory(url.inputStream()) ;


// process each <book> element in this document.
// the <book> element may have subnodes.
// You get a DOM/JDOM like tree rooted at the next <book>.


while (xml.hasContent()) {
  XMLElement book = xml.getNextElement("book");
  processBook(book);
}

Does anything like this exist?

+1  A: 

The only way to parse the part of the document without fully loading it to the memory is using the SAX parser.

Here are some official SUN examples of how to use SAX: http://java.sun.com/developer/codesamples/xml.html#sax

Juriy
I don't know exactly which nodes I'll want to find until runtime, and they could have any structure in terms of child nodes, attributes, CDATA, etc. I'd have to build the DOM document myself.I wondered whether something might exist to save me having to do this. I saw StAX, but it doesn't quite seem to offer this.
Michael Jones
@Michael Jones - SAX is your beast.
Romain Hippeau
Is there a way in SAX to access a complete element when the end element event is called? For instance, if I know I'm looking for the next "book" node, if when SAX fired the end element event I could just write "return xml.getElement()" and it returned the whole node, that'd be great. As far as I can tell I can only access properties of an individual element (e.g. attributes, CDATA, etc) instead of the whole node and its children.
Michael Jones
A: 

You could do the following:

  1. Scan the XML file using SAX or StAX and immediately serizalize everything back into a StringBuilder, i.e. create your own copy of the XML file.

  2. If you encounter a endElement and you know you don't need the subtree you just parsed, clear the StringBuilder.

  3. If you need it, you can build a DOM tree from the "copy" you created.

With this you can fall back to standard frameworks, one for conventional SAX parsing and one for conventional DOM building. Only the custom serizalization might require some hacking.

Also it helps if you need to know the tree boundaries in advance. (book elements in your example) Otherwise further processing would be required.

DR
I think this is my best option. I'll give it a go. Thanks.
Michael Jones