tags:

views:

123

answers:

4

I had a regex, like so:

(?<one-1>cat)|(?<two-2>dog)|(?<three-3>mouse)|(?<four-4>fish)

When I tried to use this pattern in a .Net app, it failed, because the group name contained a '-' in it.

So, as a workaround, I tried to use two regexes, the first:

(?<A>cat)|(?<Be>dog)|(?<C>mouse)|(?<D>fish)

would match the original cases I was looking for into group names I could control.
And then, I intended to use the correctly matched group name from that regex in one like this:

(?<A>one-1)|(?<Be>two-2)|(?<C>three-3)|(?<D>four-4)

I would do so, by finding the string that matched this pattern and determining if the group names were equal.

I know this seems a bit convoluted. Thanks of any help offered.

+1  A: 

Try using underscores instead of dashes. When I changed your original regex to:

(?<one_1>cat)|(?<two_2>dog)|(?<three_3>mouse)|(?<four_4>fish)

I was able to use Groups["one_1"].Value to get the matched group.

EDIT: Example:

string pattern = "(?<one_1>cat)|(?<two_2>dog)|(?<three_3>mouse)|(?<four_4>fish)";
string[] inputs = new[]{"cat", "horse", "dog", "dolphin", "mouse", "hamster", "fish"};
string[] groups = new[]{"one_1", "two_2", "three_3", "four_4"};

foreach(string input in inputs)
{
    Match oMatch = Regex.Match(input, pattern, RegexOptions.IgnoreCase);

    Console.WriteLine("For input: {0}", input);

    foreach(string group in groups)
    {
        Console.WriteLine("Group {0}:\t{1}", group, oMatch.Groups[group].Value);    
    }
    Console.WriteLine("----------");
}

Using dashes as you were in the beginning will cause it to not find the group name. I'm assuming it uses the same variable naming rules as the rest of .NET, so if you couldn't use it as a legal variable name, don't use it as a group name.

Chris Doggett
I can't change the dashes. Or rather, I don't want to impose that constraint on the parameter.
Irwin
+3  A: 

?<one-1> doesn't works because - is used into balancing groups:

Deletes the definition of the previously defined group name2 and stores in group name1 the interval between the previously defined name2 group and the current group. If no group name2 is defined, the match backtracks. Because deleting the last definition of name2 reveals the previous definition of name2, this construct allows the stack of captures for group name2 to be used as a counter for keeping track of nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets; for example, (?'name1-name2').

You can't escape that minus sign, so you must to use another separator.

Rubens Farias
A: 

Something along the lines of the following?

string[,] patterns = {
    { "one-1", "cat" },
    { "two-2", "dog" },
    { "three-3", "mouse" },
    { "four-4", "fish" },
};

var regex = buildRegex(patterns);

string[] tests = { "foo", "dog", "bar", "fish" };
foreach (var t in tests) {
    var m = regex.Match(t);
    Console.WriteLine("{0}: {1}", t, reportMatch(regex, m));
}

Output

foo: no match
dog: two-2 = dog
bar: no match
fish: four-4 = fish

First we build up a Regex instance by escaping the group names and combining them with the patterns. Any non-word character is replaced with the sequence _nnn_ where nnn is its UTF-32 value.

private static Regex buildRegex(string[,] inputs)
{   
    string regex = ""; 
    for (int i = 0; i <= inputs.GetUpperBound(0); i++) {
        var part = String.Format(
            "(?<{0}>{1})",
            Regex.Replace(inputs[i,0], @"([\W_])", new MatchEvaluator(escape)),
            inputs[i,1]);

        regex += (regex.Length != 0 ? "|" : "") + part;
    }   

    return new Regex(regex);
}   

private static string escape(Match m)
{
    return "_" + Char.ConvertToUtf32(m.Groups[1].Value, 0) + "_";
}   

For matches, the .NET library doesn't give us an easy way to get a group's name, so we have to go the other way: for each group name, we check whether that group matched and if so unescape its name and let the caller know both name and captured substring.

private static string reportMatch(Regex regex, Match m)
{   
    if (!m.Success)
        return "no match";

    foreach (var name in regex.GetGroupNames()) {
        if (name != "0" && m.Groups[name].Value.Length > 0)
            return String.Format(
                       "{0} = {1}",
                       Regex.Replace(name, @"_(\d+)_",
                           new MatchEvaluator(unescape)),
                       m.Groups[name].Value);
    }

    return null;
}   

private static string unescape(Match m)
{   
    return Char.ConvertFromUtf32(int.Parse(m.Groups[1].Value));
}   
Greg Bacon
You might want to look at this for another way to get the group names: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.getgroupnames.aspx
Ahmad Mageed
@Ahmad Thanks! Updated.
Greg Bacon
I went with a variation of this. Thanks
Irwin
@Irwin You're welcome! I'm glad it helps.
Greg Bacon
A: 

I'm not clear on what you want the end result to be, but the following will map the value to the original group names. From there you can determine how to proceed.

Give this a try:

var map = new Dictionary<string, string>()
{
    {"A", "one-1"},
    {"B", "two-2"},
    {"C", "three-3"},
    {"D", "four-4"}
};

string[] inputs = { "cat", "dog", "mouse", "fish", "bird" };
string pattern = "(?<A>cat)|(?<B>dog)|(?<C>mouse)|(?<D>fish)";

Regex rx = new Regex(pattern);
foreach (string input in inputs)
{
    Match m = rx.Match(input);
    if (m.Success)
    {
        string groupName = rx.GetGroupNames()
                             .Where(g => g != "0" && m.Groups[g].Value != "")
                             .Single();
        Console.WriteLine("Match: {0} -- Group name: {1} -- Corresponds to: {2}",
                            input, groupName, map[groupName]);
    }
    else
    {
        Console.WriteLine("Failed: {0}", input);
    }
}

The Regex.GetGroupNames method provides an easy way to extract group names from the pattern. When referring to a group's value that did not match it will return an empty string. The idea behind this approach is to loop through (LINQ through) each group name and check whether a match exists while ignoring the default "0" group. If it matches, then that's the group we're after.

Ahmad Mageed