tags:

views:

38

answers:

1

I originally asked a question along these lines using Regex but was recommended to use the PHP DOM library instead...which is superior, but I am still stuck.

Basically I want to wrap the contents of an <a> in a <span> if it is not already wrapped in <span>.

<?php
$input = <<<EOT
<html><head></head>
<body bgcolor="#393a36">
    <a href="#"><span style="color:#ffffff;">Link 1</span></a>
    <a href="#">Link 2</a>
    <a href="#"><img src="mypic.gif" />Image Link</a>
    <a href="#"><u>Underlined Link</u></a>
</body>
</html>
EOT;


$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
    $spancount = $tag->getElementsByTagName("span")->length;
    if($spancount == 0){
        $content = nodeContent($tag);
        $element = $doc->createElement('span');
        $element->setAttribute('style','color:#ffffff;');
        $frag = $doc->createDocumentFragment();
        $frag->appendXML($content);
        $element->appendChild($frag);   
        $tag->nodeValue = ""; //clear node
        $tag->appendChild($element);
    }
}
echo $doc->saveHTML();

function nodeContent($n, $outer=false) { 
    $d = new DOMDocument('1.0'); 
    $d->formatOutput = true;
    $b = $d->importNode($n->cloneNode(true),true); 
    $d->appendChild($b);
    $h = $d->saveHTML(); 
    // remove outter tags 
    if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4)); 
    return $h; 
} 

It provides this output:

PHP Warning: DOMDocumentFragment::appendXML(): Entity: line 1: parser error : Premature end of data in tag img line 1 in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24
PHP Warning: DOMDocumentFragment::appendXML(): <img src="mypic.gif">Image Link in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24 PHP Warning: DOMDocumentFragment::appendXML(): ^ in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24 PHP Warning: DOMNode::appendChild(): Document Fragment is empty in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 25 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt; <html>
<head></head>
<body bgcolor="#393a36">
<a href="#"><span style="color:#ffffff;">Link 1</span></a>
<a href="#"><span style="color:#ffffff;">Link 2</span></a>
<a href="#"><span style="color:#ffffff;"></span></a>
<a href="#"><span style="color:#ffffff;"><u>Underlined Link</u></span></a>
</body>
</html>

This mostely works, except that it is really picky, and as you can see it dies if here is an img (or similar) tag in side the ahref.

What is the best way to make this work. I've been banging my head against for an embarrassing long time now.

EDIT
Based on feeback below, here is the revised code and output. Note that the text preceding the img tag isn't being wrapped for some reason. Any Ideas?

$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
    $spancount = $tag->getElementsByTagName("span")->length;
    if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    foreach ($tag->childNodes as $child) {
        $tag->removeChild($child);
        $element->appendChild($child);
    }
    $tag->appendChild($element);

    }
}
echo $doc->saveHTML();

Output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
<html>
<head></head>
<body bgcolor="#393a36">
    <a href="#"><span style="color:#ffffff;">Link 1</span></a>
    <a href="#"><span style="color:#ffffff;">Link 2</span></a>
    <a href="#">Image Link<span style="color:#ffffff;"><img src="mypic.gif"></span></a>
    <a href="#"><span style="color:#ffffff;"><u>Underlined Link</u></span></a>
</body>
</html>
A: 

Why bother with re-creating the node? Why not just replace the node? (If I understand what you're trying to do)...

if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    $tag->parentNode->replaceChild($element, $tag);
    $element->apendChild($tag);
}

Edit Whoops, it looks like you're trying to wrap everything under $tag in the span... Try this instead:

if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    foreach ($tag->childNodes as $child) {
        $tag->removeChild($child);
        $element->appendChild($child);
    }
    $tag->appendChild($child);
}

Edit2 Based on your results, it looks like that foreach is not completing because of the node removal... Try replacing the foreach with this:

while ($tag->childNodes->length > 0) {
    $child = $tag->childNodes->item(0);
    $tag->removeChild($child);
    $element->appendChild($child);
}
ircmaxell
Thanks, that works almost great. See the Edited OP above. In the case of the link with the `<img />` tag in it, the text is not getting wrapped in the new `<span>` just the `<img />`.
Caleb Larsen
@Caleb Larsen: I've edited back a potential solution...
ircmaxell
Brillant! That works great. Thank you!
Caleb Larsen