ansaurus

Question

Differences among .NET Capture, Group, Match

Answer 1

+1 A:

A match is the result of any individual match of the entirety of a regex. Groups and Captures both have something to do with capture groups (each (expression) from within the regex), but vary in how they behave. Here's a quote from the MSDN article on the Capture class that explains the difference:

If you do not apply a quantifier to a capturing group, the Group.Captures property returns a CaptureCollection with a single Capture object that provides information about the same capture as the Group object. If you do apply a quantifier to a capturing group, the Group.Index, Group.Length, and Group.Value properties provide information only about the last captured group, whereas the Capture objects in the CaptureCollection provide information about all subexpression captures. The example provides an illustration.

(Source)

Amber 2010-02-12 08:05:22

Answer 2

+1 A:

Here's a simpler example than the one in the document @Dav cited:

string s0 = @"foo%123%456%789";
Regex r0 = new Regex(@"^([a-z]+)(?:%([0-9]+))+$");
Match m0 = r0.Match(s0);
if (m0.Success)
{
  Console.WriteLine(@"full match: {0}", m0.Value);
  Console.WriteLine(@"group #1: {0}", m0.Groups[1].Value);
  Console.WriteLine(@"group #2: {0}", m0.Groups[2].Value);
  Console.WriteLine(@"group #2 captures: {0}, {1}, {2}",
                    m0.Groups[2].Captures[0].Value,
                    m0.Groups[2].Captures[1].Value,
                    m0.Groups[2].Captures[2].Value);
}

result:

full match: foo%123%456%789
group #1: foo
group #2: 789
group #2 captures: 123, 456, 789

The full match and group #1 results are straightforward, but the others require some explanation. Group #2, as you can see, is inside a non-capturing group that's controlled by a + quantifier. It matches three times, but if you request its Value, you only get what it matched the third time around--the final capture. Similarly, if you use the $2 placeholder in a replacement string, the final capture is what gets inserted in its place.

In most regex flavors, that's all you can get; each intermediate capture is overwritten by the next and lost; .NET is almost unique in preserving all of the captures and making them available after the match is performed. You can access them directly as I did here, or iterate through the CaptureCollection as you would a MatchCollection. There's no equivalent for the $1-style replacement-string placeholders, though.

So the reason the API design is so ugly (as you put it) is twofold: first it was adapted from Perl's integral regex support to .NET's object-oriented framework; then the CaptureCollection structure was grafted onto it. Perl 6 offers a much cleaner solution, but the authors accomplished that by rewriting Perl practically from scratch and throwing backward compatibility out the window.

Alan Moore 2010-02-12 12:35:01

Hi, thanks Alan. Just 2 more question, there're a property named "Captures" for both Match and Group, what's the difference between Match.Captures and Group.Captures? And why is the Match.Groups[0] always the same as the Match.Value? Thanks again.

smwikipedia 2010-02-14 10:40:35

A Match ISA Group ISA Capture, so Match inherits `Captures` from Group and `Value` from Capture. The Match is the zero'th Group, so `Match.Captures` is simply a one-element list containing the whole match--as if we needed *another* way to refer to it! `Value` is obviously the preferred way, being so much shorter and more intuitive. But even that's optional if you use the Match/Group/Capture reference that expects a string, because `ToString()` just delegates to `Value`.

Alan Moore 2010-02-14 18:54:45

As for *why* the Match is the zero'th Group, see this answer: http://stackoverflow.com/questions/2248213/in-c-regular-expression-why-does-the-initial-match-show-up-in-the-groups/2248767#2248767

Alan Moore 2010-02-14 18:56:08

Thanks, Alan. I will read the links. I'll let you know when I understand it. =8^D

smwikipedia 2010-02-15 07:40:57

Hi, Alan, I have read about the link you provided. Though there're many other explanations, I agree with you on that Match.Group[0] is used for the whole match because RegEx captures is 1-based. That's exactly what I have assumed. Cheers!

smwikipedia 2010-02-15 08:22:04

ansaurus

tags:

views:

answers:

Differences among .NET Capture, Group, Match

related questions