This is my first time using SAXParser, (I'm using it in Android, but I don't think that makes a difference for this particular issue) and I'm trying to read in data from an RSS feed. So far, it's working great for me for the most part, but I'm having trouble when it gets to a tag that contains HTML encoded text (e.g. <a href="http://...
). The characters()
method only reads in the <
as a <
, then treats the next set of characters as a separate entity, rather than taking the entire contents at once. I would rather it just read it in as it is, without actually translating the HTML. The code I'm using for my document handler (shortened) is posted below:
@Override
public void startElement(String uri, String localName, String qName, Attributes attrs) throws SAXException {
if (localName.equalsIgnoreCase("channel")) {
inChannel = true;
}
if (inChannel) {
if (newFeed == null) newFeed = new Feed();
if (localName.equalsIgnoreCase("image")) {
if (feedImage == null) feedImage = new Image();
inImage = true;
}
if (localName.equalsIgnoreCase("item")) {
if (newItem == null) newItem = new Item();
if (itemList == null) itemList = new ArrayList<Item>();
inItem = true;
}
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(!inItem) {
if(!inImage) {
if(inChannel) {
//Reached end of feed
if(localName.equalsIgnoreCase("channel")) {
newFeed.setItems((ArrayList<Item>)itemList);
finalFeed = newFeed;
newFeed = null;
inChannel = false;
return;
} else if(localName.equalsIgnoreCase("title")) {
newFeed.setTitle(currentValue); return;
} else if(localName.equalsIgnoreCase("link")) {
newFeed.setLink(currentValue); return;
} else if(localName.equalsIgnoreCase("description")) {
newFeed.setDescription(currentValue); return;
} else if(localName.equalsIgnoreCase("language")) {
newFeed.setLanguage(currentValue); return;
} else if(localName.equalsIgnoreCase("copyright")) {
newFeed.setCopyright(currentValue); return;
} else if(localName.equalsIgnoreCase("category")) {
newFeed.addCategory(currentValue); return;
}
}
}
else { //is inImage
//finished with feed image
if(localName.equalsIgnoreCase("image")) {
newFeed.setImage(feedImage);
feedImage = null;
inImage = false;
return;
} else if (localName.equalsIgnoreCase("url")) {
feedImage.setUrl(currentValue); return;
} else if (localName.equalsIgnoreCase("title")) {
feedImage.setTitle(currentValue); return;
} else if (localName.equalsIgnoreCase("link")) {
feedImage.setLink(currentValue); return;
}
}
}
else { //is inItem
//finished with news item
if (localName.equalsIgnoreCase("item")) {
itemList.add(newItem);
newItem = null;
inItem = false;
return;
} else if (localName.equalsIgnoreCase("title")) {
newItem.setTitle(currentValue); return;
} else if (localName.equalsIgnoreCase("link")) {
newItem.setLink(currentValue); return;
} else if (localName.equalsIgnoreCase("description")) {
newItem.setDescription(currentValue); return;
} else if (localName.equalsIgnoreCase("author")) {
newItem.setAuthor(currentValue); return;
} else if (localName.equalsIgnoreCase("category")) {
newItem.addCategory(currentValue); return;
} else if (localName.equalsIgnoreCase("comments")) {
newItem.setComments(currentValue); return;
} /*else if (localName.equalsIgnoreCase("enclosure")) {
To be implemented later
}*/ else if (localName.equalsIgnoreCase("guid")) {
newItem.setGuid(currentValue); return;
} else if (localName.equalsIgnoreCase("pubDate")) {
newItem.setPubDate(currentValue); return;
}
}
}
@Override
public void characters(char[] ch, int start, int length) {
currentValue = new String(ch, start, length);
}
And an example of the RSS feed I'm trying to parse is this one.
Any ideas?