ansaurus

Question

Parse file containing XML Fragments in Java

Answer 1

A:

How about implementing a simple wrapper around InputStream that wraps the input from the file with a root-level tag, and using that as the input to DocumentBuilder.parse()?

If the expected input is small enough to load into memory, read into a string, wrap it with a dummy start/end tag and then use:

DocumentBuilder.parse(new InputSource(new StringReader(string)))

Jim Garrison 2010-07-12 20:47:06

Answer 2

A:

You're going to need to create two separate Document objects by breaking the file up into smaller pieces and parsing those pieces individually (or alternatively reconstructing them into a larger document by adding a tag which encloses both of them).

If you can rely on the structure of the file it should be easy to read the file into a string and then search for substrings like <Product and </Product> and then use those markers to create a string you can pass into a document builder.

Jherico 2010-07-12 20:47:59

Answer 3

+2 A:

If you know this document is always going to be non-well formed... make it so. Add a new dummy <root> tag after the <?xml...>and </root> after the last of the data.

Rick 2010-07-12 20:49:07

Answer 4

A:

I'd probably create a SequenceInputStream where you sandwich the real stream with two ByteArrayInputStreams that return some dummy root start tag, and end tag.

Then i'd use use the parse method that takes a stream rather than a file name.

MeBigFatGuy 2010-07-12 20:49:52

Answer 5

A:

I agree with Jim Garrison to some extent, use an InputStream or StreamReader and wrap the input in the required tags, its a simple and easy method. Main problem i can forsee is you'll have to have some checks for valid and invalid formatting (if you want to be able to use the method for both valid and invalid data), if the formatting is invalid (because of root level tags missing) wrap the input with the tags, if its valid then don't wrap the input. If the input is invalid for some other reason, you can also alter the input to correct the formatting issues.

Also, its probably better to store the ipnut in a collection of strings (of some sort) rather than a string itself, this will mean that you wont have as much of a limit to your input size. Make each string one line from the file. You should end up with a logical and easy to follow structure which mwill make it easier to allow for corrections of other formatting issues in the future.

Hardest part about that is figuring out what has caused the invalid formatting. In your case just check for root level tags, if the tags exist and are formatted correctly, dont wrap, If not, wrap.

2010-07-13 02:45:25

ansaurus

tags:

views:

answers:

Parse file containing XML Fragments in Java

related questions