tags:

views:

54

answers:

1

I am writing a regex find/replace that will insert a <span> into every <a href> in a file where a <span> does not already exist. It will allow other tags to be in the <a href> like <img>, <b>, etc.

Currently I have this regex:
Find: (<a[^>]+?style=".*?color:#(\w{6}).*?".*?>)(.+?)(<\/a>)
Replace: '$1<span style="color:#$2;">$3</span>$4'

It works great except if i run it over the same file, it will insert a <span> inside of a <span> and it gets messy.

Target Example:

We want it to ignore this:
<a href="http://mywebiste.com/link1.html" target="_blank" style="color:#bfbcba; text-decoration:underline;"><span style="color:#bfbcba;">Howdy</span></a>

But not this:
<a href="http://mywebiste.com/link1.html" target="_blank" style="color:#bfbcba; text-decoration:underline;">Howdy</a>

Or this:
<a href="http://mywebiste.com/link1.html" target="_blank" style="color:#bfbcba; text-decoration:underline;"><img src="myimg.gif" />Howdy</a>

--EDIT--

Using the PHP DOM library as suggested in the comments, this is what I have so far:

$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
    $spancount = $tag->getElementsByTagName("span")->length;
    if($spancount == 0){
        $element = $doc->createElement('span');
        $tag->appendChild($element);
    }
}

echo $doc->saveHTML();`

Currently it will detect if there is a span inside an anchor and if there is, it will append a span to the inside of the anchor, however, i have yet to figure out how to get the original contents of the anchor inside the span.

+3  A: 

Don't use regex for this, it's not ideal for HTML.

Use a DOM library and getElementsByTagName('a') then iterate through each anchor and see if it contains a sub span element with getElementsByTagName('span'), using the length property. If it doesn't, appendChild or assign the firstChild of the anchor node to your new span created with document.createElement('span').

EDIT: As for grabbing the inner html of the anchor, if there are lots of nodes inside, try using this:

<?php
function innerHTML($node){
  $doc = new DOMDocument();
  foreach ($node->childNodes as $child)
    $doc->appendChild($doc->importNode($child, true));

  return $doc->saveHTML();
}

$html = innerHTML( $anchorRef );

This may also help you out: http://stackoverflow.com/questions/2778110/change-innerhtml-of-a-php-domelement

meder
Full ack, regex and html = bad. Though I would probably use an html parser or even simplexml instead of javascript for the sake of ppl who use lynx.
Robin
Thanks for the DOM suggestions. I have started using the PHP DOM (for the first time!) and I am having a heck of a time sorting out how to take the contents of an element: `<a href="link.html"><b>my link</b></a>` in this case `<b>my link</b>` and then wrapping that in a span. I've had no problem creating the new span element, and appending it, but getting the original contents inside the `<span>` has been stumping me.
Caleb Larsen
Well, it would be much easier for me ( and others ) to help if you posted your attempt in your original answer.
meder
Good call. Please see the edited post. Thanks for your help!
Caleb Larsen
One problem having with the above `innerHTML()` function is that it returns a string and when I set the set the `nodeValue` of the anchor to the returned string, the HTML is escaped like: `<a href="#"><u>Underlined Link</u><span style="color:#ffffff;"><u>Underlined Link</u></span></a>`The body of my `foreach` loop now looks like this: ` $element = $doc->createElement('span'); $content = innerHTML($tag); $element->setAttribute('style','color:#ffffff;'); $element->nodeValue = $content; $tag->nodeValue = ""; //clear node $tag->appendChild($element);`
Caleb Larsen