views:

80

answers:

2

I didn't find anything about parsing HTML in the XML::LibXML::Reader documentation. And I tried to parse a HTML-site and it didn't work. Is my conclusion, that XML::LibXML::Reader doesn't work with HTML right?

+2  A: 

Unless it's really XHTML, then no. XML is much more restrictive than HTML is, and XML parsers normally can't parse HTML.

HTML::TokeParser (or its base class HTML::PullParser) are the most similar to XML::LibXML::Reader (but not all that similar).

You might want to look at HTML-Tree for something similar to LibXML that does work with HTML. There's also HTML::TreeBuilder::LibXML, which wraps an even more LibXML-compatible interface around HTML-Tree.

cjm
+1  A: 

No, but HTML::TreeBuilder::LibXML implements a compatible interface on an HTML paser.

David Dorward