views:

184

answers:

2

Hi, as you can see by the subject am looking for a tool for cleaning up a HTML string in php using a HTML id property, example:

According to the following PHP string I wish to clean the HTML erasing the black11

$test = ' 
<div id="block1">
    <div id="block11">Hello1 <span>more html here...</span></div>
    <div id="block12">Hello2 <span>more html here...</span></div>
</div>
<div id="block2"></div>
';

Will became

$test = ' 
<div id="block1">
    <div id="block12">Hello2 <span>more html here...</span></div>
</div>
<div id="block2"></div>
';

I already tried the tool from htmlpurifier.org and can't get the desirable result. Only thing I achieved was removing elements by tag; erasing id; erasing class.

Is there any simple way to achieve this using purifier or other?

Thanks in advance,

+2  A: 

As a general solution for manipulating HTML data, I would recommend :


Note : it'll add some tags arround your HTML, as DOMDocument::saveHTML generates the HTML that corresponds to a full HTML Document :-(

A couple of str_replace to remove those might be OK, I suppose... It's not the hardest part of the work, and should work fine.

Pascal MARTIN
The right tool for the right job. :-)
Edward Z. Yang
A: 

Hi Pascal, that's good one.

I was trying to go that way, but am receiving an error

$html = '<div id="coco"><div id="test"><div id="testefdefde">decvedv</div></div></div>';
$doc = new DOMDocument();
$doc->validateOnParse = true;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$coco = $xpath->query("//*[@id='test']")->item(0);
$doc->removeChild($coco);
echo $doc->saveHTML();

I get the following error:

object(DOMElement)#4 (0) { } Fatal error: Uncaught exception 'DOMException' with message 'Not Found Error'

What's wrong? Looking at the docs removeChild should receive a DOMNode, but DOMElement is subclass of it. So should work, no?