tags:

views:

1881

answers:

2

I need to parse potentially large XML files, of which the schema is already provided to me in several XSD files, so XML binding is highly favored. I'd like to know if I can use JAXB to parse the file in chunks and if so, how.

+5  A: 

This is detailed in the user guide. The JAXB download from http://jaxb.dev.java.net/ includes an example of how to parse one chunk at a time.

When a document is large, it's usually because there's repetitive parts in it. Perhaps it's a purchase order with a large list of line items, or perhaps it's an XML log file with large number of log entries.

This kind of XML is suitable for chunk-processing; the main idea is to use the StAX API, run a loop, and unmarshal individual chunks separately. Your program acts on a single chunk, and then throws it away. In this way, you'll be only keeping at most one chunk in memory, which allows you to process large documents.

See the streaming-unmarshalling example and the partial-unmarshalling example in the JAXB RI distribution for more about how to do this. The streaming-unmarshalling example has an advantage that it can handle chunks at arbitrary nest level, yet it requires you to deal with the push model --- JAXB unmarshaller will "push" new chunk to you and you'll need to process them right there.

In contrast, the partial-unmarshalling example works in a pull model (which usually makes the processing easier), but this approach has some limitations in databinding portions other than the repeated part.

skaffman
Right, that's one of the sites I found when researching this, but I was unable to find the "streaming-unmarshalling" and "partial-unmarshalling" examples it referred to in section 4.4.1.
John Fawcett
Odd. Where are you looking? I just downloaded the JAR from jaxb.dev.java.net/2.1.12, unpacked it, and there under "samples" is "partial-unmarshalling" and "stream-unmarshalling".
skaffman
+1  A: 

Its confusing because you have to navigate to the following directory: jaxb-ri/dist/samples/samples-src to find them.

Stephen