views:

829

answers:

9

I am trying to write some JavaScript RegEx to replace user inputed tags with real html tags, so [b] will become <b> and so forth. the RegEx i am using looks like so

var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;

with the following JavaScript

s.replace(exptags,"<$1>$2</$1>");

this works fine for single nested tags, for example

[b]hello[/b] [u]world[/u]

but if the tags are nested inside each other it will only match the outer tags, for example

[b]foo [u]to the[/u] bar[/b]

this will only match the b tags. how can I fix this? should i just loop until the starting string is the same as the outcome? I have a feeling that the ((.){1,}?) patten is wrong also?

Thanks

+1  A: 

AFAIK you can't express recursion with regular expressions.

You can however do that with .NET's System.Text.RegularExpressions using balanced matching. See more here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

If you're using .NET you can probably implement what you need with a callback. If not, you may have to roll your own little javascript parser.

Then again, if you can afford to hit the server you can use the full parser. :)

What do you need this for, anyway? If it is for anything other than a preview I highly recommend doing the processing server-side.

c15c8ra1n
Yes its for a live preview for a comments area, server side its PHP, but have the code for that bit.
Re0sless
A: 

Yes, you will have to loop. Alternatively since your tags looks so much like HTML ones you could replace [b] for <b> and [/b] for </b> separately. (.){1,}? is the same as (.*?) - that is, any symbols, least possible sequence length.

Updated: Thanks to MrP, (.){1,}? is (.)+?, my bad.

vava
A: 

You are right about the inner pattern being troublesome.

((.){1,}?)

That is doing a captured match at least once and then the whole thing is captured. Every character inside your tag will be captured as a group.

You are also capturing your closing element name when you don't need it and are using {1} when that is implied. Below is a cleanup up version:

/[(b|u|i|s|center|code)](.+?)[\/\1]/ig

Not sure about the other problem.

Richard Szalay
A: 

You could just repeatedly apply the regexp until it no longer matches. That would do odd things like "[b][b]foo[/b][/b]" => "<b>[b]foo</b>[/b]" => "<b><b>foo</b></b>", but as far as I can see the end result will still be a sensible string with matching (though not necessarily properly nested) tags.

Or if you want to do it 'right', just write a simple recursive descent parser. Though people might expect "[b]foo[u]bar[/b]baz[/u]" to work, which is tricky to recognise with a parser.

Marijn
A: 

The reason the nested block doesn't get replaced is because the match, for [b], places the position after [/b]. Thus, everything that ((.){1,}?) matches is then ignored.

It is possible to write a recursive parser in server-side -- Perl uses qr// and Ruby probably has something similar.

Though, you don't necessarily need true recursive. You can use a relatively simple loop to handle the string equivalently:

var s = '[b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]';
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;

while (s.match(exptags)) {
   s = s.replace(exptags, "<$1>$2</$1>");
}

document.writeln('<div>' + s + '</div>'); // after

In this case, it'll make 2 passes:

0: [b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]
1: <b>hello</b> <u>world</u> <b>foo [u]to the[/u] bar</b>
2: <b>hello</b> <u>world</u> <b>foo <u>to the</u> bar</b>


Also, a few suggestions for cleaning up the RegEx:

var exptags = /\[(b|u|i|s|center|code)\](.+?)\[\/(\1)\]/ig;
  • {1} is assumed when no other count specifiers exist
  • {1,} can be shortened to +
Jonathan Lonowski
If you add [center][/center] to your test case, and nest one tag inside another, and take my regex from below, I'll vote your answer up.
Joe Hildebrand
By "nest one tag inside another", I meant "the same tag inside itself", for example: [b] foo [b]bar[/b] baz[/b]
Joe Hildebrand
A: 

Agree with Richard Szalay, but his regex didn't get quoted right:

var exptags = /\[(b|u|i|s|center|code)](.*)\[\/\1]/ig;

is cleaner. Note that I also change .+? to .*. There are two problems with .+?:

  1. you won't match [u][/u], since there isn't at least one character between them (+)
  2. a non-greedy match won't deal as nicely with the same tag nested inside itself (?)
Joe Hildebrand
A: 

(.){1,}? is the same as (.*?)

That is not true!

(.){1,}? is the same as (.)+?, but the intention was probably (.+?)

(Of course, this should be a comment on that answer, but I cannot comment yet...)

Victor
A: 

How about:

tagreg=/\[(.?)?(b|u|i|s|center|code)\]/gi;
"[b][i]helloworld[/i][/b]".replace(tagreg, "<$1$2>");
"[b]helloworld[/b]".replace(tagreg, "<$1$2>");

For me the above produces:

<b><i>helloworld</i></b>
<b>helloworld</b>

This appears to do what you want, and has the advantage of needing only a single pass.

Disclaimer: I don't code often in JS, so if I made any mistakes please feel free to point them out :-)

Cheers,
Steve

freespace
+2  A: 

The easiest solution would be to to replace all the tags, whether they are closed or not and let .innerHTML work out if they are matched or not it will much more resilient that way..

var tagreg = /\[(\/?)(b|u|i|s|center|code)]/ig
div.innerHTML="[b][i]helloworld[/b]".replace(tagreg, "<$1$2>") //no closing i
//div.inerHTML=="<b><i>helloworld</i></b>"
A Nony Mouse