I didn't find anything about parsing HTML in the XML::LibXML::Reader documentation. And I tried to parse a HTML-site and it didn't work. Is my conclusion, that XML::LibXML::Reader doesn't work with HTML right?
+2
A:
Unless it's really XHTML, then no. XML is much more restrictive than HTML is, and XML parsers normally can't parse HTML.
HTML::TokeParser (or its base class HTML::PullParser) are the most similar to XML::LibXML::Reader (but not all that similar).
You might want to look at HTML-Tree for something similar to LibXML that does work with HTML. There's also HTML::TreeBuilder::LibXML, which wraps an even more LibXML-compatible interface around HTML-Tree.
cjm
2010-04-23 08:17:42
+1
A:
No, but HTML::TreeBuilder::LibXML implements a compatible interface on an HTML paser.
David Dorward
2010-04-23 08:22:02