views:

829

answers:

4

I'm using DOM to parse string. I need function that strips span tags and its contents. For example, if I have:

This is some text that contains photo.
<span class='title'> photobyile</span>

I would like function to return

This is some text that contains photo.

This is what I tried:

    $dom = new domDocument;
    $dom->loadHTML($string);
    $dom->preserveWhiteSpace = false;
    $spans = $dom->getElementsByTagName('span');

    foreach($spans as $span)
    {
     $naslov = $span->nodeValue; 
     echo $naslov;

     $string = preg_replace("/$naslov/", " ", $string);
    }

I'm aware that $span->nodeValue returns value of span tag and not whole tag, but I don't know how to get whole tag, together with class name.

Thanks, Ile

+1  A: 

Try removing the spans directly from the DOM tree.

$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$elements = $dom->getElementsByTagName('span');
$spans = array();
foreach($elements as $span) {
    $spans[] = $span;
}
foreach($spans as $span) {
    $span->parentNode->removeChild($span);
}
echo $dom->saveHTML();
Lukáš Lalinský
That's it... thanks a lot! :)
ile
+2  A: 

If you don't need to use DOM, take a look at comments at strip_tags manual.

David Kuridža
You can't tell strip_tags which tags should it remove, only which tags should it *not* remove.
Lukáš Lalinský
Correct, that's why I have referred to the comments where methods for stripping tags can be found.
David Kuridža
If not DOM than I'd have to use regular expressions. That's not what I really want :)
ile
A: 

@Lukáš Lalinský: This is string with your code...

$string = '
    Some photos<br>
    <span class="naslov_slike">photo_by_ile_IMG_1676-01</span><br />
    <img alt="photo_by_ile_IMG_1676-01" src="http://localhost/sinj.com.hr/img/blog/82.jpg" /><br />
    <span class="naslov_slike">photo_by_ile_IMG_1699-01</span><br />
    <img alt="photo_by_ile_IMG_1699-01" src="http://localhost/sinj.com.hr/img/blog/90.jpg" /><br />
    <span class="naslov_slike">photo_by_ile_IMG_1697-01</span><br /><img alt="photo_by_ile_IMG_1697-01" src="http://localhost/sinj.com.hr/img/blog/89.jpg" /><br /><span class="naslov_slike">photo_by_ile_IMG_1695-01</span><br />
    <img alt="photo_by_ile_IMG_1695-01" src="http://localhost/sinj.com.hr/img/blog/88.jpg" />

    ';

    $dom = new domDocument;
    $dom->loadHTML($string);
    $dom->preserveWhiteSpace = false;
    $spans = $dom->getElementsByTagName('span');

    foreach($spans as $span)
    {

     $span->parentNode->removeChild($span);
    }

    echo $dom->saveHTML();

It removes every second span... Any idea why?

ile
It seems removeChild() breaks the iterator, I've updated my answer to fix this.
Lukáš Lalinský
+1  A: 

@ile - I've had that problem - it's because the index of the foreach iterator happily keeps incrementing, while calling removeChild() on the DOM also seems to remove the nodes from the DomNodeList ($spans). So for every span you remove, the nodelist shrinks one element and then gets its foreach counter incremented by one. Net result: it skips one span.

I'm sure there is a more elegant way, but this is how I did it - I moved the references from the DomNodeList to a second array, where they would not be removed by the removeChild() operation.

    foreach($spans as $span) {
        $nodes[] = $span;
    }
    foreach($nodes as $span) {
        $span->parentNode->removeChild($span);
    }
kander
I see...Although, I must confess I didn't know how exactly foreach loop works. Now it's bit clearer.Thank you!
ile