views:

944

answers:

2

I'm parsing some HTML with NSXMLParser and it hits a parser error anytime it encounters an ampersand. I could filter out ampersands before I parse it, but I'd rather parse everything that's there.

It's giving me error 68, NSXMLParserNAMERequiredError: Name is required.

My best guess is that it's a character set issue. I'm a little fuzzy on the world of character sets, so I'm thinking my ignorance is biting me in the ass. The source HTML uses charset iso-8859-1, so I'm using this code to initialize the Parser:

NSString *dataString = [[[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding] autorelease];
NSData *dataEncoded = [[dataString dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES] autorelease];
NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];

Any ideas?

+1  A: 

Are you sure you have valid XML? You are required to have special characters like & escaped, in the raw XML file you should see &

Kendall Helmstetter Gelner
+5  A: 

To the other posters: of course the XML is invalid... it's HTML!

You probably shouldn't be trying to use NSXMLParser for HTML, but rather libxml2

For a closer look at why, check out this article.

Benjamin Cox
Okay, then. Wrong tool for the job? Thanks for the tip. I may have to do that.
Silromen
Great point about the HTML, the NSXMLParser part threw me off. libxml2 seems like a very reasonable alternative. See this previous SO article:http://stackoverflow.com/questions/405749/parsing-html-on-the-iphone
Epsilon Prime