ansaurus

Question

Answer 1

+1 A:

Try removing the spans directly from the DOM tree.

$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$elements = $dom->getElementsByTagName('span');
$spans = array();
foreach($elements as $span) {
    $spans[] = $span;
}
foreach($spans as $span) {
    $span->parentNode->removeChild($span);
}
echo $dom->saveHTML();

Lukáš Lalinský 2009-10-04 10:18:20

That's it... thanks a lot! :)

ile 2009-10-04 11:08:04

Answer 2

+2 A:

If you don't need to use DOM, take a look at comments at strip_tags manual.

David Kuridža 2009-10-04 10:21:48

You can't tell strip_tags which tags should it remove, only which tags should it *not* remove.

Lukáš Lalinský 2009-10-04 10:24:50

Correct, that's why I have referred to the comments where methods for stripping tags can be found.

David Kuridža 2009-10-04 10:31:26

If not DOM than I'd have to use regular expressions. That's not what I really want :)

ile 2009-10-04 10:34:40

Answer 3

A:

@Lukáš Lalinský: This is string with your code...

$string = '
    Some photos<br>
    <span class="naslov_slike">photo_by_ile_IMG_1676-01</span><br />
    <img alt="photo_by_ile_IMG_1676-01" src="http://localhost/sinj.com.hr/img/blog/82.jpg" /><br />
    <span class="naslov_slike">photo_by_ile_IMG_1699-01</span><br />
    <img alt="photo_by_ile_IMG_1699-01" src="http://localhost/sinj.com.hr/img/blog/90.jpg" /><br />
    <span class="naslov_slike">photo_by_ile_IMG_1697-01</span><br /><img alt="photo_by_ile_IMG_1697-01" src="http://localhost/sinj.com.hr/img/blog/89.jpg" /><br /><span class="naslov_slike">photo_by_ile_IMG_1695-01</span><br />
    <img alt="photo_by_ile_IMG_1695-01" src="http://localhost/sinj.com.hr/img/blog/88.jpg" />

    ';

    $dom = new domDocument;
    $dom->loadHTML($string);
    $dom->preserveWhiteSpace = false;
    $spans = $dom->getElementsByTagName('span');

    foreach($spans as $span)
    {

     $span->parentNode->removeChild($span);
    }

    echo $dom->saveHTML();

It removes every second span... Any idea why?

ile 2009-10-04 10:33:29

It seems removeChild() breaks the iterator, I've updated my answer to fix this.

Lukáš Lalinský 2009-10-04 10:51:01

Answer 4

+1 A:

@ile - I've had that problem - it's because the index of the foreach iterator happily keeps incrementing, while calling removeChild() on the DOM also seems to remove the nodes from the DomNodeList ($spans). So for every span you remove, the nodelist shrinks one element and then gets its foreach counter incremented by one. Net result: it skips one span.

I'm sure there is a more elegant way, but this is how I did it - I moved the references from the DomNodeList to a second array, where they would not be removed by the removeChild() operation.

    foreach($spans as $span) {
        $nodes[] = $span;
    }
    foreach($nodes as $span) {
        $span->parentNode->removeChild($span);
    }

kander 2009-10-04 10:48:44

I see...Although, I must confess I didn't know how exactly foreach loop works. Now it's bit clearer.Thank you!

ile 2009-10-04 11:09:33

ansaurus

tags:

views:

answers:

Strip HTML tags and its contents

related questions