tags:

views:

92

answers:

5
(\[(c|C)=)(#?([a-fA-F0-9]{1,2}){3})\](.*)\[/(c|C)\]

I want this expression to match text like: "This is [c=FFFFFF]white text[/c] and [C=#000]black text[/C]."

It do match one BB-code alone, but if there are more after each other (like in the example), it will create a match (1 match) of both BB-code-sequences. (from [c=FFFFFF]wh... to ...ck text[/C])

Why is this happening? Also, how do I make the dot (.) include newlines in C#?

+1  A: 

This happens because the RE is greedy; it will always try to produce the largest possible match.

It should be possible to make your RE engine non-greedy, see the linked document for tips on what to try.

unwind
+3  A: 

If you don't care about nested tags, you can do that :

(\[[cC]=)(#?([a-fA-F0-9]{3}){1,2})\](.*?)\[/[cC]\]
//                                     ^- lazy match

If you want to handle nested tags with regex, check this article on code project.

ybo
+2  A: 

Dot matches newline characters if you set the option RegexOptions.Singleline (more on that here).

acezanne
A: 

You need a lazy regular expression to not pick up all of the [c] tags

Try this

\[c=(#?.*?)\](.*?)\[/c\] or
\[c=(#?\w*?)\](\w*?)\[/c\]

You should set the options on your regex object to ingnore case.

skyfoot
A: 

Regex is a quick an dirty way to do this, and the solution here is to use .*? rather than just .*. However, if you want a more robust solution is probably easier without regex. In C# you happen to be able to do nested structures, but that doesn't mean it's actually easy. It would be better to use a lexical parser and construct a DOM. Most likely the code will be easier to read and maintain.

Adam Luter