tags:

views:

637

answers:

3

What is going on behind the scenes when you mark a regular expression as one to be compiled? How does this compare/is different from a cached regular expression?

Using this information, how do you determine when the cost of computation is negligible compared to the performance increase?

+9  A: 

This entry in the BCL Team Blog gives a nice overview: "Regular Expression performance".

In short, there are three types of regex (each executing faster than the previous one):

  1. interpreted

    fast to create on the fly, slow to execute

  2. compiled (the one you seem to ask about)

    slower to create on the fly, fast to execute (good for execution in loops)

  3. pre-compiled

    create at compile time of your app (no run-time creation penalty), fast to execute

So, if you intend to execute the regex only once, or in a non-performance-critical section of your app (i.e. user input validation), you are fine with option 1.

If you intend to run the regex in a loop (i.e. line-by-line parsing of file), you should go with option 2.

If you have many regexes that will never change for your app and are used intensely, you could go with option 3.

Tomalak
+2  A: 

It should be noted that regular expression performance since .net 2.0 has been improved with a MRU cache of uncompiled regular expressions. The Regex code no longer reinterprets the regex every time.

So there is probably a bigger performance penalty with a compiled on the fly regular expression. In addition to the slower load times, it uses more memory to compile the opcodes.

Basically current advice is either don't compile them, or compile them in advance to a separate assembly.

Ref: BCL Team Blog Regular Expression performance [David Gutierrez]

Robert Paulson