views:

58

answers:

1

I've googled some code that converts an url into a hyperlink using bbcode The code is :

// format the url tags: [url=www.website.com]my site[/url]
// becomes: <a href="www.website.com">my site</a>
exp = new Regex(@"\[url\=([^\]]+)\]([^\]]+)\[/url\]");
str = exp.Replace(str, "<a href=\"$1\">$2</a>");

// format the img tags: [img]www.website.com/img/image.jpeg[/img]
// becomes: <img src="www.website.com/img/image.jpeg" />
exp = new Regex(@"\[img\]([^\]]+)\[/img\]");
str = exp.Replace(str, "$1\" />");

I also want to convert ordinary links hyperlinks.I've googled some more and got this:

exp = new Regex("(http://[^ ]+)");
str = exp.Replace(str, "<a href=\"$1\">$1</a>");

The problem is, when i mix them and third regular expression is executed, the first two will be messed up. as it could result in :

<img src="<a href='www.website.com/img/image.jpeg>www.website.com/img/image.jpeg</a>" />

how can i specify in my third regular expression that he cannot convert strings that begin with 'href="' or 'src="' ?

+1  A: 

Given the interesting combinations of tags users could throw at you, regular expressions quickly become cumbersome and difficult to use for parsing tags.

BBCode is essentially a grammar unto itself, and the best way to interpret a grammar programmatically is with an actual parser.

Have a look at http://bbcode.codeplex.com/. I can't vouch for its effectiveness, but they've implemented a parser for BBCode in C# that might do what you're looking for.

Chris
basic regular expressions also don't have enough business logic in them to be able to protect you against sly users. It's very easy for a user to do something like [url]javascript:window.alert("lolol")[/url], or [img] http://path.to/a/script.php [/img], and cause all manner of havoc.
Chris