views:

157

answers:

2

I have a small test app that parses a few RSS feeds. It all goes well for all except for one.

I scanned through the feed's XML and noticed that it fails once it reached a tag that has "&" in it. So, if I have a tag like this:

<like>beer & barbeque</like>

The log says that it found a string

beer

And it crashes with the exception

Error Domain=NSXMLParserErrorDomain Code=68 "Operation could not be completed. (NSXMLParserErrorDomain error 68.)

The most annoying thing is that I don't even need the data from the problematic tag. Any idea how I can work around this?

A: 

Hpple is supposed to be able to parse "messy" HTML. Maybe it can handle your messy RSS.

scompt.com
+1  A: 

Since the feed is already failing... do a string replace on '&<space> ' to '&amp;' while you fight it out with the feed publisher to clean up his act.

The feed must be valid XML. Period.

Niels Castle
That's what I first thought of, but this is what I got...Bare in mind that the feed is not in English and has a whole bunch of non-ASCII characters.I tried to convert the data to string using UTF8 -> got NULLI tried to convert the data to string using ASCII -> got a string which when converted back to data (no changes) is impossible to parse.I tried to init the string directly with the URL -> got a string filled with junk.The xml header claims that it's using ISO-8859-1 encoding, but as far as I know it's overridden by HTTP to UTF8.
Vnuce
The feed must be valid XML. Period. I'm with you 100%, I'm just trying to be nice... for now :)
Vnuce
Post the URL here if it's a public feed - and we'll have a look at it.
Niels Castle
Can't do that, I'm afraid... Never mind, I persuaded the publisher to fix the feed. Thanks for the effort :)
Vnuce
Thats ok, sounds like you ended up with the best solution after all - getting the publisher to fix his feed. Good work!
Niels Castle