libXML relaxed HTML parsing | ansaurus

tags:

views:

71

answers:

1

Q:

libXML relaxed HTML parsing

Hi,

I am trying to scrape some content from an HTML page. I'm using libxml2 and htmlReadMemory to get a xmlDocPtr. The HTML is simple, but it has a problem. It's basically the following:

<tr><td><tr><td>Some content</td></tr></td></tr>

libxml doesn't like the nested tr, tds. It keeps giving me the following error:

HTML parser error : Unexpected end tag : td
      </TD>
           ^
HTML parser error : Unexpected end tag : tr
    </TR>

I am using the following option: HTML_PARSE_RECOVER.

At this point nothing i do allows libxml to parse the HTML because of this. I can't change the HTML because I have no access to it.

Anyone have any clues how I can get libxml to parse this sort of HTML?

Thanks

+1 A:

What's the exact call you're using to parse? I'd suggest combining these options if you don't want any errors/warnings:

HTML_PARSE_RECOVER|HTML_PARSE_NOERROR|HTML_PARSE_NOWARNING

bosmacs 2010-09-17 19:25:39

I do this: theDoc = htmlReadMemory([inData bytes], [inData length], NULL, enc, HTML_PARSE_RECOVER | HTML_PARSE_NOWARNING | HTML_PARSE_NOBLANKS);

Felix Khazin 2010-09-17 19:29:41

Does using HTML_PARSE_NOERROR still parse the document even if there are errors in the HTML?

Felix Khazin 2010-09-17 19:30:42

Actually, i put in HTML_PARSE_NOERROR and now it's working. Thanks for that!

Felix Khazin 2010-09-17 19:35:52

I believe libxml will still correctly parse the document in most cases, but it probably depends how badly mangled it is.

bosmacs 2010-09-17 19:42:27

related questions

How do we get arround Apple's decission to skip sms templates for iPhone?

iPhone App Crashing - Error Question

Can I write native iPhone apps using Python

Does the Iphone 1/2 have a compass inside?

How do you beta test an iphone app?

Is it just the iPhone simulator that is restricted to intel only Mac's?

iPhone App Minus App Store?

Deleting messages from Exchange IMAP mailbox on iPhone

Is there a multiplatform framework for developing iPhone / Android applications?

How can I launch the Google Maps iPhone application from within my own native application?

Tips for a successful AppStore submission?

iPhone app that access the Core Location framework over web

Virtual Mac?

What's a good machine for iPhone development?

How can I develop for iPhone using a Windows development machine?

Recommended iPhone Development Resources

Best Wiki for Mobile Users

How to programmatically send SMS on the iPhone?

iPhone - Exchange Calendars in Public Folders

iPhone web applications, templates, frameworks?

Understanding reference counting with Cocoa / Objective C

How-to articles for iPhone development, Objective-C

What are the correct pixel dimensions for an apple-touch-icon?

How do I give my web sites an icon for iPhone?

iPhone app in landscape mode