tags:

views:

141

answers:

3

I'm looking to tokenize really simple strings,but struggling to get the right Regex.

The strings might look like this:

string1 = "{[Surname]}, some text... {[FirstName]}"

string2 = "{Item}foo.{Item2}bar"

And I want to extract the tokens in the curly braces (so string1 gets "{[Surname]}","{[FirstName]}" and string2 gets "{Item}" and "{Item2}")

So basically, there's two different token types I want to extract: {[Foo]} and {Bar}.

this question is quite good, but I can't get the regex right: poor mans lexer for c# Thanks for the help!

+1  A: 

Unless rules are very convoluted, that will be (?<Token>\{\[.+?\]\}) for the first string and (?<Token>\{.+?\}) for the second

Anton Gogolev
+1  A: 

what about (?<token>\{[^\}]*\})

PierrOz
A: 

They're both good answers guys, thanks. Here's what I settled for in the end:

// DataToken = {[foo]}

// FieldToken = {Bar}

string pattern = @"(?<DataToken>\{\[\w+\]\})|(?<FieldToken>\{\w+\})";

MatchCollection matches = Regex.Matches(expression.ExpressionString, pattern,
RegexOptions.ExplicitCapture);

string fieldToken = string.Empty;
string dataToken = string.Empty;

foreach (Match m in matches)

{
    // note that EITHER fieldtoken OR DataToken will have a value in each loop
    fieldToken = m.Groups["FieldToken"].Value;
    dataToken = m.Groups["DataToken"].Value;

    if (!string.IsNullOrEmpty(dataToken))
    {
         // Do something
    }

    if (!string.IsNullOrEmpty(fieldToken))
    {
         // Do something else
   }
}
Pete