views:

294

answers:

1

I need to process html submitted in my web application and don't want to munge the whole thing with regular expressions. What tokenizer approach and/or software should I take?

+2  A: 

I would use the DOMDocument::loadHTML method to load the HTML document. And if you want a simpler handling than the DOMDocument methods, you can convert it to a SimpleXML object by using simplexml_import_dom().

Gumbo