views:

35

answers:

2

I have an html page that I want to edit. I want to remove a certain section like the following:

<ul class="agentDetail">
........
.......
........
</ul>

I want to be able to remove the tags and all the content between them. The idea is to edit a page and redisplay it minus some data that I don't want to be seen (hence the removal of some sections).

How can I do this in php?

+2  A: 

I would recommend parsing the code as XML, then using XPath to de/select the nodes you require. Here's a starting point:

$dom = getDom("http://www.stackoverflow.com");
parseDom($dom);

function getDom($url)
{
    $contentUtf8 = $this->fetchDataFromUrl($url);
    $htmlData = mb_convert_encoding($contentUtf8, 'HTML-ENTITIES', "UTF-8");

    $dom = new DomDocument('1.0', 'utf-8');
    $dom->substituteEntities = false;
    $dom->preserveWhitespace = false;
    @$dom->loadHTML($htmlData);

    return $dom;
}

function parseDom($dom)
{
    $xpath = new DOMXPath($dom);
    $nodes = $xpath->query("//ul[@class='agentDetail']");

    // manipulate nodes here...
}
gt
Check out [`libxml_use_internal_errors`](http://de3.php.net/manual/en/function.libxml-use-internal-errors.php) for an alternative to suppressing the errors `loadHTML` might give
Gordon
A: 

Are you hosting this page directly or are you reading it into php, and echoing it after parsing? In the former case, you can give it a .php extension and enclose those lines within <? if(0): ?> <? endif; ?> blocks

<?php if(0): ?>
<ul class="agentDetail">
........
.......
........
</ul>
<?php endif; ?>
Amarghosh