tags:

views:

51

answers:

2

Hello, I must cleanup some HTML code to remove <style> and <link> tags inside the <body> tag. I'm already using PHP Tidy to do some cleanup but I did not found how to remove those tags with PHP Tidy.

Do you have a solution ? Or maybe another markup cleaner PHP class...

+2  A: 

Don't know how to do that with Tidy, but you can use DOM

$dom = new DOMDocument;                    // init new DOMDocument
$dom->loadHTML($html);                     // load HTML into it
$xpath = new DOMXPath($dom);               // create a new XPath
$nodes = $xpath->query('//body/style');    // Find all style elements in body tag
foreach($nodes as $node) {                 // Iterate over found elements
    $node->parentNode->removeChild($node); // Remove complete style node
}
echo $dom->saveHTML();                     // output cleaned HTML

For the <link> elements, adjust the Xpath to //body/link.

Gordon
Nice one. I hadn't considered that before.
CaseySoftware
Thank you. That did the trick.
Franck
A: 

An alternative to Tidy would be http://htmlpurifier.org/

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.


Made this an additional answer, since it is so completely unrelated to the DOM solution.

Gordon
Indeed, I will have a look at HTML Purifier which seems to be a much more efficient solution.
Franck