views:

142

answers:

3

I have been developing Java programs that parse html source code of webpages by using various html parsers like Jericho, NekoHtml etc...

Now I want to develop parsers in PHP language. So before starting, I want to know that are there any html parsers available that I can use with PHP to parse html code

+2  A: 

The builtin class DOM parser does a very good job. There are many other xml parsers, too.

soulmerge
+1 .......... thanks
Yatendra Goel
+2  A: 

Check out DOMDocument.

Example #1 Creating a Document

<?php
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Test<br></body></html>");
echo $doc->saveHTML();
Yada
+1 ........ thanks
Yatendra Goel
+1  A: 

DOM is pretty good for this. It can also deal with invalid markup, however, it will throw undocumented errors and exceptions in cases of imperfect markup so I suggest you filter HTML with HTMLPurifier or some other library before loading it with the DOM.

Richard Knop
+1 for "filter HTML with HTMLPurifier or some ..."
Yatendra Goel