views:

54

answers:

3

I want to replace all the occurrences of a string that doesn't start with "<pre>" and doesn't end in "</pre>".

So let's say I wanted to find new-line characters and replace them with "<p/>". I can get the "not followed by" part:

var revisedHtml = html.replace(/[\n](?![<][/]pre[>])/g, "<p/>");

But I don't know the "not starting with" part to put at the front.

Any help please? :)

+1  A: 

What you would need is a negative lookbehind which is a Zero Length assertion which ensures that some condition is not true before the match. Unfortunately Javascript does not support negative lookbehinds. Take a look at this workaround:

Javscript Negative Lookbehind Equivalent

Alex Blokh
A: 

Why not do the reverse. Look for all the substrings enclosed in <pre> tags. Then you know which parts of your string are not enclosed in <pre>.

EDIT: More elegant solution: use split() and use the <pre> HTML as the delimiters. This gives you the HTML outside the <pre> blocks.

var s = "blah blah<pre>formatted</pre>blah blah<pre>another formatted</pre>end";
var rgx = /<pre>.*?<\/pre>/g
var nonPreStrings = s.split(rgx);
for (var idx in nonPreStrings)
    alert(nonPreStrings[idx]);
Jerome
A: 

Here's how Steve Levithan's first lookbehind-alternative can be applied to your problem:

var output = s.replace(/(<pre>[\s\S]*?<\/pre>)|\n/g, function($0, $1){
    return $1 ? $1 : '<p/>';
});

When it reaches a <pre> element, it captures the whole thing and plugs it right back into the output. It never really sees the newlines inside the element, just gobbles them up along with all other content. Thus, when the \n in the regex does match a newline, you know it's not inside a <pre> element, and should be replaced with a <p/>.

But don't make the mistake of regarding this technique as a hack or a workaround; I would recommend this approach even if lookbehinds were available. With the lookaround approach, the regex has to examine every single newline and apply the lookarounds each time to see if it should be replaced. That's a lot of unnecessary work it has to do, plus the regex is a lot more complicated and less maintainable.

As always when using regexes on HTML, I'm ignoring a lot of factors that can affect the result, like SGML comments, CDATA sections, angle brackets in attribute values, etc. You'll have to determine which among those factors you have to deal with in your case, and which ones you can ignore. When it comes to processing HTML with regexes, there's no such thing as a general solution.

Alan Moore