tags:

views:

1532

answers:

3

I am looking for good methods of manipulating HTML in PHP. For example, the problem I'm currently have is dealing with malformed html.

I am getting input that looks something like this:

<div>This is some <b>text

As you noticed, the html is missing closing tags. I could use regex or a XML Parser to solve this problem. However, it is likely that I will have to do other DOM manipulation in the future. I am wonder if there is any good PHP libraries that handle DOM manipulation similar to how Javascript deals with DOM manipulation.

+10  A: 

PHP has a PECL extension that gives you access to the features of HTML Tidy. Tidy is a pretty powerful library that should be able to take code like that and close tags in an intelligent manner.

I use it to clean up malformed XML and HTML sent to me by a classified ad system prior to import.

ceejayoz
+1  A: 

For manipulating the DOM i think that what you're looking for is this. I've used to parse HTML documents from the web and it worked fine for me.

Juan
+4  A: 

I've found PHP Simple HTML DOM to be the most useful and straight forward library yet. Better than PECL I would say.

I've written an article on how to use it to scrape myspace artist tour dates (just an example.) Here's a link to the php simple html dom parser.

bryan
+1 Used it before and works pretty well so far
Marcel Tjandraatmadja