ansaurus

Question

Answer 1

A:

If you only want one instance change

string expression = "(\\{[0-9]+\\})"; \\one or more repetitions

to

string expression = "(\\{[0-9]{1}})";  \\Exactly 1 repetition

mcauthorn 2009-11-04 14:19:20

Not going to work. Tokens {10}, {11}, etc. will no longer match and multiple instances of {0}, {1} to {9} will still be captured if they exist.

Steve Crane 2009-11-04 14:48:19

Also, it you only want to match a single digit, the {1} count specifier is redundant.

Steve Crane 2009-11-04 14:52:03

Answer 2

+1 A:

Regular expressions solve lots of problems, but not every problem. How about using other tools in the toolbox?

var parameters = new HashSet<string>(
    matches.Select(mm => mm.Value).Skip(1));

Or

var parameters = matches.Select(mm => mm.Value).Skip(1).Distinct();

sixlettervariables 2009-11-04 15:48:48

Meta comment, the 0th match is the entire matching corpus.

sixlettervariables 2009-11-04 15:49:50

I was thinking of something like this to make the matches unique after the regex does its work. Just wondered if the regex itself might have some magic to do this itself without additional code. See my answer for the solution I came up with.

Steve Crane 2009-11-04 16:13:06

Sometimes you can finagle what you want out of Regex, but often at the cost of readability or performance. I tend to take the easy route and see if I need more out of it :-D

sixlettervariables 2009-11-04 18:14:40

Answer 3

A:

Here is what I came up with.

private static bool TokensMatch(string t1, string t2)
{
  return TokenString(t1) == TokenString(t2);
}

private static string TokenString(string input)
{
  Regex tokenParser = new Regex(@"(\{[0-9]+\})|(\[.*?\])");

  string[] tokens = tokenParser.Matches(input).Cast<Match>()
      .Select(m => m.Value).Distinct().OrderBy(s => s).ToArray<string>();

  return String.Join(String.Empty, tokens);
}

Note that the difference in the regular expression from the one in my question is due to the fact that I cater for two types of token; numbered ones delimited by {} and named ones delimited by [];

Steve Crane 2009-11-04 16:16:08

RegexOptions.Compiled may help along with moving that Regex out of the method and making it static.

sixlettervariables 2009-11-04 18:15:54

Answer 4

+1 A:

Here's something you could use for a pure regex solution:

Regex r = new Regex(@"(\{[0-9]+\}|\[[^\[\]]+\])(?<!\1.*\1)",
                    RegexOptions.Singleline);

But for the sake of both efficiency and maintainability, you're probably better off with a mixed solution like the one you posted.

Alan Moore 2009-11-05 05:59:33

Thanks Alan. I will stay with my current solution but it's good to expand my knowledge of regular expressions.

Steve Crane 2009-11-05 10:15:24

Doing the distinct checking outside the regex is faster too. Tested by changing the expression and removing the Distinct() call. Returns the same result but takes almost twice the time. A good reminder that overusing regular expressions, or any tool, may not always be the best solution.

Steve Crane 2009-11-05 10:42:34

ansaurus

tags:

views:

answers:

Preventing duplicate matches in RegEx

related questions