tags:

views:

251

answers:

3

Hi!

I have a situation in which I parse a body of text and replace certain phrases with links. I then need to re-parse the string to replace a second set of phrases with links. The problem arises at this point, where certain words or phrases in the second set can be substrings of phrases already replaced in the first pass.

Example: The string "blah blah grand canyon blah" will become "blah blah <a href="#">grand canyon</a> blah" after the first pass. The second pass might try to replace the word "canyon" with a link, so the resulting, broken, text would read: "blah blah <a href="#">grand <a href="#">canyon</a></a> blah".

So I've been trying to use preg_replace and a regular expression to prevent nested <a> tags from occurring - by only replacing text which is not already in a link. I have tried to regexes that check based on whether there are </a> tags further on in the text but can't get these to work.

Maybe another approach is required?

Many thanks in advance! Dave

+2  A: 

This looks very very close to this question

Paul
Wow, thank you that is exactly what we needed!
I just posted a PHP version of that solution to your question.
Jan Goyvaerts
+1  A: 

This might work for all passes:

$string = preg_replace('/([^>]|^)grand canyon\b/','$1<a href=#>grand canyon</a>',$string);

EDIT: assuming you can afford missing when the text contains stuff like "amazonas>grand canyon"

PEZ
Actually my first answer didn't work. Now fixed.
PEZ
A: 

For the second pass, you could use a regex such as:

(<a[^>]*>.*?</a>)|grand

This regex matches either a link, or the word "grand". If the link is matched, it is captured into the first (and only) capturing group. If the group matched, simply re-insert the existing link. If the word grand matches, you know it's outside a link, and you can turn it into a link.

In PHP you can do this with preg_replace_callback:

$result = preg_replace_callback('%(<a[^>]*>.*?</a>)|grand%', compute_replacement, $subject);

function compute_replacement($groups) {
    // You can vary the replacement text for each match on-the-fly
    // $groups[0] holds the regex match
    // $groups[n] holds the match for capturing group n
    if ($groups[1]) {
        return $groups[1];
    } else {
        return "<a href='#'>$groups[0]</a>";
}
Jan Goyvaerts