tags:

views:

68

answers:

3

I'm writing a translator, not as any serious project, just for fun and to become a bit more familiar with regular expressions. From the code below I think you can work out where I'm going with this (cheezburger anyone?).

I'm using a dictionary which uses a list of regular expressions as the keys and the dictionary value is a List<string> which contains a further list of replacement values. If I'm going to do it this way, in order to work out what the substitute is, I obviously need to know what the key is, how can I work out which pattern triggered the match?

        var dictionary = new Dictionary<string, List<string>>
        {                     
            {"(?!e)ight", new List<string>(){"ite"}},
            {"(?!ues)tion", new List<string>(){"shun"}},
            {"(?:god|allah|buddah?|diety)", new List<string>(){"ceiling cat"}},
            ..
        }

        var regex = "(" + String.Join(")|(", dictionary.Keys.ToArray()) + ")";

        foreach (Match metamatch in Regex.Matches(input
           , regex
           , RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
        {
            substitute = GetRandomReplacement(dictionary[ ????? ]);
            input = input.Replace(metamatch.Value, substitute);
        }

Is what I'm attempting possible, or is there a better way to achieve this insanity?

+6  A: 

You can name each capture group in a regular expression and then query the value of each named group in your match. This should allow you to do what you want.

For example, using the regular expression below,

(?<Group1>(?!e))ight

you can then extract the group matches from your match result:

match.Groups["Group1"].Captures
Jeff Yates
Thanks, this is exactly what I needed!
Andrew
@Andrew: Happy to help.
Jeff Yates
+1  A: 

Using named groups like Jeff says is the most robust way.

You can also access the groups by number, as they are expressed in your pattern.

(first)|(second)

can be accessed with

match.Groups[1] // match group 2 -> second

Of course if you have more parenthesis which you don't want to include, use the non-capture operator ?:

((?:f|F)irst)|((?:s|S)econd)

match.Groups[1].Value // also match group 2 -> second
Mikael Svenson
+1  A: 

You've got another problem. Check this out:

string s = @"My weight is slight.";
Regex r = new Regex(@"(?<!e)ight\b");
foreach (Match m in r.Matches(s))
{
  s = s.Replace(m.Value, "ite");
}
Console.WriteLine(s);

output:

My weite is slite.

String.Replace is a global operation, so even though weight doesn't match the regex, it gets changed anyway when slight is found. You need to do the match, lookup, and replace at the same time; Regex.Replace(String, MatchEvaluator) will let you do that.

Alan Moore