tags:

views:

1011

answers:

5

I stripped some tags that I thought were unnecessary from an XML file. Now when I try to parse it, my SAX parser throws an error and says my file is not well-formed. However, I know every start tag has an end tag. The file's opening tag has a link to an XML schema. Could this be causing the trouble? If so, then how do I fix it?

Edit: I think I've found the problem. My character data contains "&lt" and "&gt" characters, presumably from html tags. After being parsed, these are converted to "<" and ">" characters, which seems to bother the SAX parser. Is there any way to prevent this from happening?

+1  A: 

Does the sax parser not give you details about where it thinks it's not well-formed?

Have you tried loading the file into an XML editor and checking it there? Do other XML parsers accept it?

The schema shouldn't change whether or not the XML is well-formed or not; it may well change whether it's valid or not. See the wikipedia entry for XML well-formedness for a little bit more, or the XML specs for a lot more detail :)

EDIT: To represent "&" in text, you should escape it as &amp;

So:

&lt

should be

&amp;lt

(assuming you really want ampersand, l, t).

Jon Skeet
I examined the file in the offending place, and it is only character data (unless I'm counting the lines wrong). Unfortunately, the file is too large to be worked with in a standard editor. I have a root tag, and open and close tags. This remains a mystery.
Jacob Lyles
Try it with another non-DOM parser (XmlReader in .NET, or maybe SAX in Java) and see whether it works there or possibly gives more useful information.
Jon Skeet
"Too large"? Stop using vague words. How many bytes is it? It may be time to switch a serious editor...
bortzmeyer
Jacob Lyles
+2  A: 

I would suggest putting those tags back in and making sure it still works. Then, if you want to take them out, do it one at a time until it breaks.

However, I question the wisdom of taking them out. If it's your XML file, you should understand it better. If it's a third-party XML file, you really shouldn't be fiddling with it (until you understand it better :-).

paxdiablo
A: 

I would second recommendation to try to parse it using another XML parser. That should give an indication as to whether it's the document that's wrong, or parser.

Also, the actual error message might be useful. One fairly common problem for example is that the xml declaration (if one is used, it's optional) must be the very first thing -- not even whitespace is allowed before it.

StaxMan
A: 

You could load it into Firefox, if you don't have an XML editor. Firefox shows you the error.

stesch
A: 

i was alos having the same prob but i was using chrome to run my xml file but i used firefox then it was not givning error working fine.but chrome and explorer show the errors.so what should i do ..

sachin