views:

102

answers:

2

I'm looking for something like http://validator.nu/, I'll be validating html input ( string ) and I want to notify the user if there are any missing end tags, whether a certain element can't have a certain attribute, things of that nature ( HTML 4.01 Strict validation basically ).

Sidenote: I won't be dealing with XML/DTDs, and I don't want to correct the user input like how http://htmlpurifier.org/ does it.

+4  A: 

Hi,

For HTML Validation done from PHP, the tidy extension might do just what you want :

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree

The example given on the tidy::__construct is like this :

$html = <<< HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>title</title></head>
<body>
<p>paragraph <bt />
text</p>
</body></html>

HTML;

$tidy = new tidy();
$tidy->ParseString($html);

$tidy->CleanRepair();

if ($tidy->errorBuffer) {
    var_dump($tidy->errorBuffer);
}

And gives this output :

string 'line 8 column 14 - Error: <bt> is not recognized!
line 8 column 14 - Warning: discarding unexpected <bt>' (length=104)

A couple or other methods seem interesting too, btw ;-)


Note you need to have this extension installed / enabled on your webserver, though -- there should be a "tidy" section in the output of phpinfo().

Pascal MARTIN
+1  A: 

Soo, I ended up using the official W3 Validator SOAP web service because it's far superior to Tidy's warnings and it's just the perfect tool that I needed. Did have to learn some SOAP and namespace rules, but it was worth it :)

meder