tags:

views:

772

answers:

3

I would like to replace "&gt" with ">" and "&lt" with "<" but only when they occur outside "<pre>" and "</pre>". Is this possible?

$newText = preg_replace('&gt', '>', $text);

I would be using the preg_replace in PHP as above.

A: 

I'm not sure offhand if PHP's regex engine does negative lookarounds, but that's what you're interested in. The regex in other languages would look something like:

/(?<!(<pre>[^(<\/pre>)]*))XXX(?!(.*<\/pre>))/

(inhale - I think I have that right)

where XXX is your pattern "<" or ">"

nb. it's likely there's an even simpler pattern too

annakata
Unfortunately php regex doesn't allow you to put lookahead and lookbehind assertions inside of a character class. I haven't tested your above code but it looks like a compile nightmare (at least in PHP). Also php requires your assertions be fixed width.
localshred
+3  A: 

This isn't really an answer because you asked for a regex, but I just wrote a really dirty function to do it:

<?php

$html = ' <pre>hello &gt; &lt;</pre>
            &gt; &lt;
            <pre></pre>';


function stringReplaceThing($str) {
    $offset = 0;
    $num = 0;
    $preContents = array();
    $length = strlen($str);

    //copy string so can maintain offsets/positions in the first string after replacements are made
    $str2=$str;

    //get next position of <pre> tag
    while (false !== ($startPos = stripos($str, '<pre>', $offset))) {
        //the end of the opening <pre> tag
        $startPos += 5;

        //try to get closing tag
        $endPos = stripos($str, '</pre>', $startPos);

        if ($endPos === false) {
            die('unclosed pre tag..');
        }

        $stringWithinPreTags = substr($str, $startPos, $endPos - $startPos);
        //replace string within tags with some sort of token
        if (strlen($stringWithinPreTags)) {
            $token = "!!T{$num}!!";
            $str2 = str_replace($stringWithinPreTags, $token, $str2);
            $preContents[$token] = $stringWithinPreTags;
            $num++;
        }

        $offset = $endPos + 5;
    }

    //do the actual replacement
    $str2 = str_replace(array('&gt;', '&lt;'), array('>', '<'), $str2);

    //put the contents of <pre></pre> blocks back in
    $str2 = str_replace(array_keys($preContents), array_values($preContents), $str2);
    return $str2;
}


print stringReplaceThing($html);
Tom Haigh
Because regex in php doesn't allow variable width assertions, I think this approach is probably the smartest. It's definitely more hands on, but all other attempts I've made to come up with a regex answer don't fully solve the problem. I'll keep working on it though, don't worry :).
localshred
+2  A: 

If you want to do this with a regex, the trick is to make your regex match the things you don't want to replace as well as the things you want to replace, and dynamically calculate the replacement depending on what was matched.

$new_text = preg_replace_callback('%&lt;|&gt;|<pre>.*?</pre>%si', compute_replacement, $text);

function compute_replacement($groups) {
    if ($groups[0] == '&lt;') {
      return '<';
    } elseif ($groups[1] == '&gt;') {
      return '>';
    } else {
      return $groups[0];
    }
}
Jan Goyvaerts