So I've got to parse ugly files that contain nested tags like
<p>blah<strong>lah</strong>blah</p>
The nested tags are defined and I don't care about them. But they make XmlPullParser fail:
XmlPullParser parser = XmlPullParserFactory.newInstance().newPullParser();
parser.setInput(some_reader);
while (parser.next() != XmlPullParser.END_DOCUMENT) {
if (XmlPullParser.START_TAG == event) {
String tag = parser.getName();
if (tag != null) {
tag = tag.toLowerCase();
} else {
continue;
}
if ("p".equals(tag)) {
String text = parser.nextText();
// and here we go
// org.xmlpull.v1.XmlPullParserException: expected: /p read: strong
}
}
}
Question: any chance I could get away w/o preprocessing the file stripping all the unnecessary tags or using a third-party library?
EDIT: Updated the snippet to actually make sense.