views:

271

answers:

3

Hi,

I am beginer in SAX and Java.

I am tring to read Information from not well formed XML File.

When I try to use SAX or DOM Parser then I have this error:

The markup in the document following the root element must be well-formed.

My XML File looks like this:

<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
   ...

Can I force SAX or DOM to parse not well-formed XML files?

Thank you for your help. Haythem

+9  A: 

Your best bet is to make the XML well-formed, probably by pre-processing it a bit. In this case, you can achieve that simply by putting an XML declaration on (and even that's optional) and providing a root element (which is not optional), like this:

<?xml version="1.0"?>
<wrapper>
    <format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
    <format type="driver" t="123412">001;023</format>
</wrapper>

There I've arbitrarily picked the name "wrapper" for the root element; it can be whatever you like.

T.J. Crowder
I'd just like to add that you don't necessarily need to do that modification on the disk, but that you could do it on the fly by providing a filtering `InputStream`/`Reader`. Especially for big files (or reading XML from a URL) this can be very useful. A `SequenceInputStream` could be useful here: http://java.sun.com/javase/6/docs/api/java/io/SequenceInputStream.html
Joachim Sauer
Good posibility.is not easier to trun out the parse?.can I turn out the parse() mehtode and overwrite it to ignore the non-well-formed status?
Haythem
Haythem: probably not, because the parser is deep within the library and the behavior of such a browser would be undefined (the XML libraries don't know how to handle XML with more than one root element). Doing it this way instantly makes your XML well-formed and **all** XML-aware tools can suddenly handle it just fine (provided you have no other incorrect parts in there).
Joachim Sauer
+1  A: 

Hint: using sax or stax you can successfully parse a not well formed xml document until the FIRST "well formed-ness" error is encountered.

(I know that this is not of too much help...)

Yaneeve
A: 

As the DOM will scan you xml file then build a tree, the root node of the tree is like the as 1 Answer. However, if the Parser can't find the or even , it can even build the tree. So, its better to do some pre-processing the xml file before parser it by DOM or Sax.

jasonfungsing