views:

38

answers:

2

I have a SAXParser with with an XMLReader.

SAXParserFactory saxPF = SAXParserFactory.newInstance();
SAXParser sp = saxPF .newSAXParser();
XMLReader xmlR = sp.getXMLReader();
MyHandler myHandler = new MyHandler();
xmlR .setContentHandler(myHandler );

My handler code uses startElement and endElement to detect with it's inside a tag. It does this by setting a boolean and using characters() to grab the value

public void startElement(String namespaceURI, 
    String localName, String qName, Attributes atts) throws SAXException {
    if (localName.equals("myTag")) this.in_myTag = true;
}

public void characters(char ch[], int start, int length) {
            if(in_myTag )  { c.setMyTag(new String(ch, start, length));
}

The problem is that I have a tag that is "A & B Value" and it's notifying characters() for "A" and "&" and "B" and "Value". So the final value of setMyTag is "Value"

<myTag>A & B value</myTag>

http://www.saxproject.org/apidoc/org/xml/sax/helpers/DefaultHandler.html

A: 

Take a look at that http://stackoverflow.com/questions/2573542/trouble-parsing-quotes-with-sax-parser-javax-xml-parsers-saxparser-on-android-a/2576718#2576718

By the way & is incorrect XML character, it should be &amp;

Fedor
Thanks for linking me to that post
Ally
A: 
<myTag>A & B value</myTag>

(That's not XML. I assume you mean A &amp; B value, to be well-formed.)

In general you can't guarantee that your characters() handler will get called exactly once per element. If there is no text content in the element it won't get called at all; if there are entity references or the text is very long you are likely to get called more than once. Plus of course any comments, PIs or other elements in there will definitely need multiple calls.

Whilst it is unusual for a predefined entity reference like &amp; to cause a separate callback to the content handler, there's nothing in the spec to say it can't happen at any time for any (or no) reason. In particular:

SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks

Consequently, a SAX handler must collect every piece of text content sent to it and join them together when endElement occurs, rather than setting the content from a single characters callback.

bobince
Thanks for your help
Ally