tags:

views:

72

answers:

5

I inherited an "XML" license file containing no root element, but rather two XML fragments (<XmlCreated> and <Product>) so when I try to parse the file, I (expectantly) get an error about a document that is not-well-formed.

I need to get both the XmlCreated and Product tags.

Sample XML file:

<?xml version="1.0"?>

<XmlCreated>May 11 2009</XmlCreated>

<!-- License Key file Attributes -->
<Product image ="LicenseKeyFile">

 <!-- MyCompany -->
 <Manufacturer ID="7f">
  <SerialNumber>21072832521007</SerialNumber>
  <ChassisId>72060034465DE1C3</ChassisId>
  <RtspMaxUsers>500</RtspMaxUsers>
  <MaxChannels>8</MaxChannels>
 </Manufacturer>

</Product>

Here is the current code that I use to attempt to load the XML. It does not work, but I've used it before as a starting point for well-formed XML.

public static void main(String[] args) { try { File file = new File("C:\path\LicenseFile.xml"); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(file); } catch (Exception e) { e.printStackTrace(); } }

At the db.parse(file) line, I get the following Exception:

[Fatal Error] LicenseFile.xml:6:2: The markup in the document following the root element must be well-formed.
org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.
 at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
 at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
 at com.mycompany.licensesigning.LicenseSigner.main(LicenseSigner.java:20)

How would I go about parsing this frustrating file?

A: 

How about implementing a simple wrapper around InputStream that wraps the input from the file with a root-level tag, and using that as the input to DocumentBuilder.parse()?

If the expected input is small enough to load into memory, read into a string, wrap it with a dummy start/end tag and then use:

DocumentBuilder.parse(new InputSource(new StringReader(string)))
Jim Garrison
A: 

You're going to need to create two separate Document objects by breaking the file up into smaller pieces and parsing those pieces individually (or alternatively reconstructing them into a larger document by adding a tag which encloses both of them).

If you can rely on the structure of the file it should be easy to read the file into a string and then search for substrings like <Product and </Product> and then use those markers to create a string you can pass into a document builder.

Jherico
+2  A: 

If you know this document is always going to be non-well formed... make it so. Add a new dummy <root> tag after the <?xml...>and </root> after the last of the data.

Rick
A: 

I'd probably create a SequenceInputStream where you sandwich the real stream with two ByteArrayInputStreams that return some dummy root start tag, and end tag.

Then i'd use use the parse method that takes a stream rather than a file name.

MeBigFatGuy
A: 

I agree with Jim Garrison to some extent, use an InputStream or StreamReader and wrap the input in the required tags, its a simple and easy method. Main problem i can forsee is you'll have to have some checks for valid and invalid formatting (if you want to be able to use the method for both valid and invalid data), if the formatting is invalid (because of root level tags missing) wrap the input with the tags, if its valid then don't wrap the input. If the input is invalid for some other reason, you can also alter the input to correct the formatting issues.

Also, its probably better to store the ipnut in a collection of strings (of some sort) rather than a string itself, this will mean that you wont have as much of a limit to your input size. Make each string one line from the file. You should end up with a logical and easy to follow structure which mwill make it easier to allow for corrections of other formatting issues in the future.

Hardest part about that is figuring out what has caused the invalid formatting. In your case just check for root level tags, if the tags exist and are formatted correctly, dont wrap, If not, wrap.