views:

248

answers:

1

So I've got to parse ugly files that contain nested tags like

<p>blah<strong>lah</strong>blah</p>

The nested tags are defined and I don't care about them. But they make XmlPullParser fail:

XmlPullParser parser = XmlPullParserFactory.newInstance().newPullParser();
parser.setInput(some_reader);
while (parser.next() != XmlPullParser.END_DOCUMENT) {
    if (XmlPullParser.START_TAG == event) {
        String tag = parser.getName();
        if (tag != null) {
            tag = tag.toLowerCase();
        } else {
            continue;
        }
       if ("p".equals(tag)) {
           String text = parser.nextText();
           // and here we go
           // org.xmlpull.v1.XmlPullParserException: expected: /p read: strong
        }
    }
}

Question: any chance I could get away w/o preprocessing the file stripping all the unnecessary tags or using a third-party library?

EDIT: Updated the snippet to actually make sense.

A: 

So I've got rid of XMLPullParser and switched to SAXParser. Besides, it performs better.

alex