tags:

views:

1045

answers:

4
+1  Q: 

BBcode regex

I'm terrible with regex, but I've had a try and a Google (and even looked in reddit's source) and I'm still stuck so here goes:

My aim is to match the following 'codes' and replace them with the HTML tags. It's just the regex I'm stuck with.

**bold text**
_italic text_
~hyperlink~

Here's my attempts at the bold one:

^\*\*([.^\*]+)\*\*$

Can anyone point out why this isn't working? I'm using the preg syntax.

Thanks

+2  A: 

use:

\*\*(.[^*]*)\*\*

explanation:

\*\*      // match two *'s
(.        // match any character
[^*]      // that is not a *
*)        // continuation of any character
\*\*      // match two *'s

in a character class "[ ]" "^" is only significant if it's the first character. so (.*) matches anything, (.[^*]*) is match anything until literal *

edit: in response to comments to match asterisk within (ie **bold *text**), you'd have to use a non greedy match:

\*\*(.*?)\*\*

character classes are more efficient non greedy matches, but it's not possible to group within a character class (see "Parentheses and Backreferences...")

Owen
This does not match a string like this: 'What is **3 * 4**?' just because it has one asterisk in it.
yjerem
I think you want to get rid of the period here. It will work for **, but _(.[^_]*)_ will turn __ _hi_ into <i></i>hi_, not <i></i> <i>hi</i>
Mark
Your first answer shouldn't have a dot, it will match on `***test**`.
Brad Gilbert
+2  A: 

First of all, get rid of the ^ and the $. Using those will only match a string that starts with ** and ends with **. Second, use the greedy quantifier to match as little text as possible, instead of making a character class for all characters other than asterisks.

Here's what I suggest:

\*\*(.+?)\*\*
yjerem
+1  A: 
\*\*(.*?)\*\*

that will work for the bold text.

just replace the ** with _ or ~ for the others

Adam
+1  A: 
J.F. Sebastian
Best answer so far.
Brad Gilbert