tags:

views:

179

answers:

2

I have wrote a SAX parser. It works fine when the attibute values are nested in double quotes. But if i dont use quote it throws a exception. I want my parser to parse the XML file whose attributes values are not inside quotes. I want to parse following type of file:

<root>
    <tag1 attribute1=value1 > my data  </tag1>
</root>

Note that value1 is not inside quotes

Can i make my parser to parse the above file? If yes how?

+6  A: 

The SAX parser won't read that because it's not well-formed XML. All attribute values need to be enclosed in either single or double character quotes.

To make your parser read it, you'd find have to tidy/purify/fix it with a relevant library.

cletus
In fact, the XML is not well-formed. Validity refers to validation against a DTD, XSD, or any other kind of syntactical constraints applied to a XML.
Bryan Menard
I meant 'additional constraints'... Sorry.
Bryan Menard
A: 

Try NekoHTML ( http://nekohtml.sourceforge.net/usage.html )

e.g.

package sample;

import org.apache.xerces.parsers.AbstractSAXParser;
import org.cyberneko.html.HTMLConfiguration;

public class HTMLSAXParser extends AbstractSAXParser {
    public HTMLSAXParser() {
        super(new HTMLConfiguration());
    }
}
Sam