views:

2225

answers:

7

In C# should you have code like:

public static string importantRegex = "magic!";

public void F1(){
  //code
  if(Regex.IsMatch(importantRegex)){
    //codez in here.
  }
  //more code
}
public void main(){
  F1();
/*
  some stuff happens......
*/
  F1();
}

or should you persist an instance of a Regex containing the important pattern? What is the cost of using Regex.IsMatch? I imagine there is an NFA created in each Regex intance. From what I understand this NFA creation is non trivial.

+8  A: 

If you're going to reuse the regular expression multiple times, I'd create it with RegexOptions.Compile and cache it. There's no point in making the framework parse the regex pattern every time you want it.

Jon Skeet
@Jon Skeet, what do you mean by cache it? How would you do that?
Simucal
I mean "keep it in a static readonly variable".
Jon Skeet
+1  A: 

I agree with Jon and just to clarify it would look something like this:

static Regex regex = new Regex("regex", RegexOptions.Compiled);

Its also worthwile to look at the RegexOptions enum for other flags that can be helpful at times.

Andrew Hare
typo: It's "RegexOptions.Compiled"
Joel Mueller
corrected - thanks!
Andrew Hare
+7  A: 

The static IsMatch function is defined as follows:

public static bool IsMatch(string input, string pattern){
    return new Regex(pattern).IsMatch(input);
}

And, yes, initialization of a Regex object is not trivial. You should use the static IsMatch (or any of the other static Regex functions) as a quick shortcut only for patterns that you will use only once. If you will reuse the pattern, it's worth it to reuse a Regex object, too.

As to whether or not you should specify RegexOptions.Compiled, as suggested by Jon Skeet, that's another story. The answer there is: it depends. For simple patterns or for patterns used only a handful of times, it may well be faster to use a non-compiled instance. You should definitely profile before deciding. The cost of compiling a regular expression object is quite large indeed, and may not be worth it.


Take, as an example, the following:

const int count = 10000;

string pattern = "^[a-z]+[0-9]+$";
string input   = "abc123";

Stopwatch sw = Stopwatch.StartNew();
for(int i = 0; i < count; i++)
    Regex.IsMatch(input, pattern);
Console.WriteLine("static took {0} seconds.", sw.Elapsed.TotalSeconds);

sw.Reset();
sw.Start();
Regex rx = new Regex(pattern);
for(int i = 0; i < count; i++)
    rx.IsMatch(input);
Console.WriteLine("instance took {0} seconds.", sw.Elapsed.TotalSeconds);

sw.Reset();
sw.Start();
rx = new Regex(pattern, RegexOptions.Compiled);
for(int i = 0; i < count; i++)
    rx.IsMatch(input);
Console.WriteLine("compiled took {0} seconds.", sw.Elapsed.TotalSeconds);

At count = 10000, as listed, the second output is fastest. Increase count to 100000, and the compiled version wins.

P Daddy
If the regex will be reused will not the compilation cost be greatly outweighed by the cost of re-parsing the regex on repeated use?
Andrew Hare
The regex is not reparsed on every use if a single instance is used. It would only be reparsed if you use a new instance each time (as would happen if you use the static function).
P Daddy
Let's say you store your regex object in a static readonly variable in a long-lived application. If we ignore start-up time (by moving sw.Start() after the regex constructor), the compiled regex actually executes an order of magnitude faster in your test code
Joel Mueller
Not to be flippant, but then you might as well just move the entire loop before sw.Start(), and call it all start-up time. Run-time cost is run-time cost. Making the user wait while something initializes is the same as making the user wait while something else executes.
P Daddy
And by the way, with **both** constructor calls outside the timer start, I get much less than an order of magnitude difference between compiled and not compiled at 10000 iterations on my machine. Not-compiled takes about 0.011, and compiled takes about 0.01. At count=1000, compiled is again slower.
P Daddy
+2  A: 

I suggest you read Jeff's post on compiling Regex.

As for the question, if you are asking this question it means that you are going to use it just once. So, it really does not matter as the Reflector's disassembly of Regex.IsMatch is:

public static bool IsMatch(string input, string pattern, RegexOptions options)
{
    return new Regex(pattern, options, true).IsMatch(input);
}
Recep
+1 for the link to the post on the subject
P Daddy
A: 

For an WinForm application I was working on we could define a regex on valid characters which would run on every keystroke and a validation for the text for any textboxes (data entry application), so I used a cache or compiled regexes such as

  private static Dictionary<string, Regex> regexCache = new Dictionary<string, Regex>(20);

Where the regex expression was the key.

Then I had a static function I could call when validating data:

public static bool RegExValidate(string text, string regex)
{
  if (!regexCache.ContainsKey(regex))
  {
    Regex compiledRegex = new Regex(regex,RegexOptions.Compiled);
    regexCache.Add(regex, compiledRegex);
  }
  return regexCache[regex].IsMatch(text);
}
benPearce
A: 

Hi, What would be the simbol to identify in a string value when it's letter from [a-z] or [A-Z] or a number from [0-9] by using a regex function in c#... Please if someone sees this comment please write me back.

I know in "string value "----> "^[a-z]+[0-9]+$:

^ starts a string value...in a specified order...the string has to start with letter then number, but i dont know how a string can take any order that would be letter, number, it doesn´t matter... the thing is that a symbol has to be written inside the string which has "^" insted of. what is that symbol?.

@Fuloplo, when you have a new question to ask, you should use the "Ask Question" button at the top of the page. I was all set to jump into a discussion about regex caching in .NET when I noticed the discussion ended over three months ago. (BTW: it's "^[A-Za-z0-9]+$")
Alan Moore
+1  A: 

There are many things that will affect the performance of using a regular expression. Ultimately, the only way to find out the most performant in your situation is to measure, using as realistic a situation as possible.

The page on compilation and reuse of regular expression objects on MSDN covers this. In summary, it says

  1. Compiled regular expressions take time to compile, and once compiled will only have their memory released on AppDomain unloads. Whether you should use compilation or not will depend on the number of patterns you are using and how often they are used.

  2. Static Regex methods cache the parsed regular expression representation for the last 15 (by default) patterns. So if you aren't using many different patterns in your application, or your usage is sufficiently clustered, there won't be much difference between you caching the instance or the framework caching it.

Ben Lings
I don't care if the thread *is* seven months old, this is the only answer that gets it right.
Alan Moore