views:

73

answers:

3

I've got an XML document that is in either a pre or post FO transformed state that I need to extract some information from. In the pre-case, I need to pull out two tags that represent the pageWidth and pageHeight and in the post case I need to extract the page-height and page-width parameters from a specific tag (I forget which one it is off the top of my head).

What I'm looking for is an efficient/easily maintainable way to grab these two elements. I'd like to only read the document a single time fetching the two things I need.

I initially started writing something that would use BufferedReader + FileReader, but then I'm doing string searching and it gets messy when the tags span multiple lines. I then looked at the DOMParser, which seems like it would be ideal, but I don't want to have to read the entire file into memory if I could help it as the files could potentially be large and the tags I'm looking for will nearly always be close to the top of the file. I then looked into SAXParser, but that seems like a big pile of complicated overkill for what I'm trying to accomplish.

Anybody have any advice? Or simple implementations that would accomplish my goal? Thanks.

Edit: I forgot to mention that due to various limitations I have, whatever I use has to be "builtin" to core Java, in which I can't use and/or download any 3rd party XML tools.

+1  A: 

You can use XPath to search for your tags. Here is a tutorial on forming XPath expressions. And here is an article on using XPath with Java.


An easy to use parser (dom, sax) is dom4j. It would be quite easier to use than the built-in SAXParser.

Bozho
+3  A: 

While XPath is very good for querying XML data, I am not aware of good and fast XPath implementation for Java (they all use DOM model at least).

I would recommend you to stick with StAX. It is extremely fast even for huge files, and it's cursor API is rather trivial:

XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader r = f.createXMLStreamReader("my.xml");
try {
  while (r.hasNext()) {
    r.next();
    . . .
  }
} finally {
  r.close()
}

Consult StAX tutorial and XMLStreamReader javadocs for more information.

incarnate
+1 for StAX ...
Bozho
That seems to be almost exactly what I was looking for. Going to investigate further.
Morinar
This worked perfectly according to what I wanted it to do. Thanks for the input!
Morinar
@Morinar - I'd suggest upvoting, in addition to accepting the answer
Bozho
@Morinar you are welcome; @Bozho thanks :)
incarnate
A: 

try "XMLDog"

This uses sax to evaluate xpaths

Santhosh Kumar T