ansaurus

Question

performance issues with finding nth occurence of a character with a regular expression

Answer 1

+9 A:

If you're really only looking for character repetitions, and not string repetitions, then you should be able to replace you method with something simple like

public static int NthIndexOf(this string target, char testChar, int n)
{
   int count = 0;

   for(int i=0; i<target.Length; i++)
   {
      if(target[i] == testChar)
      {
         count++;
         if(count == n) return i;  
      }
   }

   return -1;
}

and use that. It should have far fewer limitations.

As for why your original regex is going slow, here's what I suspect:

For your fast case, it's working because it can find a match on it's first pass through (with each group matching exactly one character)
For the slow case is because it can't find a match (and won't ever find one, because there aren't enough semicolons to satisfy the regex), but it recursively tries every possible way to break up the string (which is a really big operation)

Daniel LeCheminant 2009-04-15 18:08:33

+1 - don't ever use regex when you're doing simple string operations. Use builtin functions. This one might be even faster if it used something like (I guess .net has some internal .indexof):while (n-- > 0) { index = target.IndexOf(textChar, index+1)); if (index==-1) return -1; }return index;

viraptor 2009-04-15 18:43:18

@viraptor: +1 that comment. .NET of course has IndexOf().

Tomalak 2009-04-15 18:48:02

I expect internal optimizations here... Sure enough, Reflector reveals: public extern int IndexOf(char value, int startIndex, int count);

Tomalak 2009-04-15 18:59:01

well, initially this was meant to be more general purpose. I do agree with you however.

Steve 2009-04-15 19:02:08

Answer 2

+2 A:

Try to use a more distinct and efficient regular expression:

"^(?:[^" + value + "]*" + value + "){" + (n - 1) + "}([^" + value + "]*)

This will build the following regular expression for tempstring.NthIndexOf(";", 1593):

^(?:[^;]*;){1592}([^;]*)

But this will only work for single characters as separator.

Another approach would be to step through each character and count the occurences of the character you were looking for.

Gumbo 2009-04-15 18:10:27

Note that that only works if "value" is a single character...

David Zaslavsky 2009-04-15 18:12:23

@David: Thanks for your remark.

Gumbo 2009-04-15 18:26:28

+1 for providing a faster regex. Does .NET regex support atomic grouping (?>...)? This could help improve performance. I'm not sure if possessive quantifiers ({1592}+) are supported, but they would speed up the process as well.

Tomalak 2009-04-15 18:45:42

ansaurus

tags:

views:

answers:

performance issues with finding nth occurence of a character with a regular expression

Test Case

related questions