views:

1453

answers:

3

Say, I have a string that I need to verify the correct format of; e.g. RR1234566-001 (2 letters, 7 digits, dash, 1 or more digits). I use something like:

        Regex regex = new Regex(patternString);
        if (regex.IsMatch(stringToMatch))
        {
            return true;
        }
        else
        {
            return false;
        }

This works to tell me whether the stringToMatch follows the pattern defined by patternString. What I need though (and I end up extracting these later) are: 123456 and 001 -- i.e. portions of the stringToMatch.

Please note that this is NOT a question about how to construct regular expressions. What I am asking is: "Is there a way to match and extract values simultaneously without having to use a split function later?"

+8  A: 

You can use regex groups to accomplish that. For example, this regex:

(\d\d\d)-(\d\d\d\d\d\d\d)

Let's match a telephone number with this regex:

var regex = new Regex(@"(\d\d\d)-(\d\d\d\d\d\d\d)");
var match = regex.Match("123-4567890");
if (match.Success)
    ....

If it matches, you will find the first three digits in:

match.Groups[1].Value

And the second 7 digits in:

match.Groups[2].Value

P.S. In C#, you can use a @"" style string to avoid escaping backslashes. For example, @"\hi\" equals "\\hi\\". Useful for regular expressions and paths.

P.S.2. The first group is stored in Group[1], not Group[0] as you would expect. That's because Group[0] contains the entire matched string.

Andomar
+1 Very thorough! I'd add one thing though, the reason that you start on match.Groups[1] and not [0] is because [0] contains the entire matched string.
Neil Williams
+3  A: 

Use grouping and Matches instead.

I.e.:

// NOTE: pseudocode.
Regex re = new Regex("(\\d+)-(\\d+)");
Match m = regex.Match(stringToMatch))
if (m.success) {
  String part1 = m.Groups[1].Value;
  String part2 = m.Groups[2].Value;
  return true;
} 
else {
  return false;
}

You can also name the matches, like this:

Regex re = new REgex("(?<Part1>\\d+)-(?<Part2>\\d+)");

and access like this

  String part1 = m.Groups["Part1"].Value;
  String part2 = m.Groups["Part2"].Value;
cyberconte
very useful tip!
gnomixa
+1 for named groups
Rob Fonseca-Ensor
+3  A: 

You can use parentheses to capture groups of characters:

string test = "RR1234566-001";

// capture 2 letters, then 7 digits, then a hyphen, then 1 or more digits
string rx = @"^([A-Za-z]{2})(\d{7})(\-)(\d+)$";

Match m = Regex.Match(test, rx, RegexOptions.IgnoreCase);

if (m.Success)
{
    Console.WriteLine(m.Groups[1].Value);    // RR
    Console.WriteLine(m.Groups[2].Value);    // 1234566
    Console.WriteLine(m.Groups[3].Value);    // -
    Console.WriteLine(m.Groups[4].Value);    // 001
    return true;
}
else
{
    return false;
}
LukeH
+1 for the right regex... btw if you use IgnoreCase, you can use [a-z] instead of [A-Za-z].
Andomar