views:

1784

answers:

5

I need a Regular Expressions to get the text within 2 tags.

Lets say I want an array returned containing any text within <data> and </data> tags. Or any text within "(" and ")" tags.

How can I do that with RegEx's in C#?


An advanced question would be:

  1. The input string is "color=rgb(50,20,30)"
  2. How can I get the 3 numbers in 3 seperate array slots as returned by the RegEx processor in C#?
+3  A: 

Perl regexp would be

$string =~ /color=rgb\((\d+),(\d+),(\d+)\)/;
@array = ($1,$2,$3);

But you probably need more information that this.

Arkadiy
Thank you! I'll try that.
Jenko
A: 

I believe real problems will arise when you want to parse nesting constructs. For example, when you want to examine XML like this <data><data>123</data><data>456</data></data> to extract data included in outermost <data> tags one RegEx alone would not be enough. Just warn you to not use RegEx where some more (powerful and specific) methods exist. Real XML parsers should be considered when doing more complex tasks on XML. My 2 cents...

IgorK
Yeah, regexps are famous for not dealing with recursive data. Theoretically incapable of it, in fact.
Arkadiy
Yeah, you are right. You can match finite recursion (e.g. 3 and no more nested tags) but can't solve problem for arbitrary level of recursion. Finite automaton just can't track infinite number of steps needed to get to any recursion level.
IgorK
Recently found an interesting feature of .NET regex - balanced matching (see http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx ). So with such feature you actually *can* match brackets. Was a bit of surprise to me...
IgorK
+1  A: 

This similar question has answers that will help:

Jenko
+2  A: 

Since you specifically mentioned C#, here's how I'm doing that exact parsing:

private static readonly Regex RgbValuePattern = new Regex(@"(?<r>\d{1,3}) ?, ?(?<g>\d{1,3}) ?, ?(?<b>\d{1,3})",
                                                          RegexOptions.Compiled | RegexOptions.ExplicitCapture);

Then later on...

var match = RgbValuePattern.Match(value);

if (match.Success)
{
    int r = Int32.Parse(match.Groups["r"].Value, NumberFormatInfo.InvariantInfo);
    int g = Int32.Parse(match.Groups["g"].Value, NumberFormatInfo.InvariantInfo);
    int b = Int32.Parse(match.Groups["b"].Value, NumberFormatInfo.InvariantInfo);
    return Color.FromArgb(r, g, b);
}
Joel Mueller
Thanks so much for the great code.
Jenko
Gah!This reminds me why I avoid regexps in C++ - extremely awkward to work with! Perl is still the best language for that...
Arkadiy
+1  A: 

Using Regex to parse XML is usually a really bad idea. See this answer.

rjmunro