views:

1220

answers:

2

This code is running on Blackberry JDE v4.2.1 It's in a method that makes web API calls that return XML. Sometimes, the XML returned is not well formed and I need to strip out any invalid characters prior to parse.

Currently, I get: org.xml.sax.SAXParseException: Invalid character '' encountered.

I would like to see ideas of a fast way to attach an invalid character stripper on the input stream so that the stream just flows through the validator/stripper and into the parse call. i.e. I'm trying to avoid saving the content of the stream.

Existing code:

handler is an override of DefaultHandler
url is a String containing the API URL

hconn = (HttpConnection) Connector.open(url,Connector.READ_WRITE,true);

...

try{
   XMLParser parser = new XMLParser();
   InputStream input = hconn.openInputStream();
   parser.parse(input, handler);
   input.close();
} catch (SAXException e) {
   Logger.getInstance().error("getViaHTTP() - SAXException - "+e.toString());
}
A: 

Use a FilterInputStream. Override FilterInputStream#read to filter the offending bytes.

Problem is that requires duplicating the character-decoding logic in the stream.
Matthew Flaschen
There may not be a way to avoid that without customizing XMLParser?
JR Lawhorne
RIM doesn't have FilterInputStream http://www.blackberry.com/developers/docs/4.2.1api/index.html
JR Lawhorne
Why not just use a customized XMLParser only when there is a SAXException? It would seem that if you get a bad xml file then it would be best to reject the entire file as the damaged part may lead to bad data being extracted.
James Black
+2  A: 

It's difficult to attach a stripper on the InputStream because streams are byte-oriented. It might make more sense to do it on a Reader. You could make something like a StripReader that wraps a another reader and deals with errors. Below is a quick, untested, proof of concept for this:

public class StripReader extends Reader
{
    private Reader in;
    public StripReader(Reader in)
    {
    this.in = in;
    }

    public boolean markSupported()
    {
    return false;
    }

    public void mark(int readLimit)
    {
    throw new UnsupportedOperationException("Mark not supported");
    }

    public void reset()
    {
    throw new UnsupportedOperationException("Reset not supported");
    }

    public int read() throws IOException
    {
    int next;
    do
    {
        next = in.read();
    } while(!(next == -1 || Character.isValidCodePoint(next)));

    return next; 
    }

    public void close() throws IOException
    {
    in.close();
    }

    public int read(char[] cbuf, int off, int len) throws IOException
    {
    int i, next = 0;
    for(i = 0; i < len; i++)
    {
        next = read();
        if(next == -1)
     break;
        cbuf[off + i] = (char)next;
    }
    if(i == 0 && next == -1)
        return -1;
    else
        return i;
    }

    public int read(char[] cbuf) throws IOException
    {
    return read(cbuf, 0, cbuf.length);
    }
}

You would then construct an InputSource from then Reader then do the parse using the InputSource.

Matthew Flaschen
Since Blackberry apparently doesn't have FilterReader either, I modified the above not to use it.
Matthew Flaschen
RIM also doesn't include Character.isValidCodePoint()I had to roll my own. But, this method does seem to work - on the simulator at least. Hopefully, it will also hold up and not be too slow on a real device. Thanks!
JR Lawhorne
You're welcome. Just be sure to test well. It's unavoidably going to slow things down since every character must be (re-)checked. However, I don't think I'm doing any unnecessary copying. P.S. I'm curious as to how you implemented isValidCodePoint.
Matthew Flaschen
It's not going to show up well in this comments block but here is the method I use for validating an XML character: private boolean isValidXMLChar(int ch) { if ((ch == 0x9) || (ch == 0xA) || (ch == 0xD) || ((ch >= 0x20) else return false; }
JR Lawhorne