tags:

views:

236

answers:

5

Hi

I have a situation where my regular expressions compile extremely slowly on Windows Server 2008. I wrote a small console application to highlight this issue. The app generates its own input and builds up a Regex from words in an XML file. I built a release version of this app and ran it both on my personal laptop (running XP) and the Windows 2008 server. The regular expression took 0.21 seconds to compile on my laptop, but 23 seconds to compile on the server.

Any ideas what could be causing this? The problem is only on first use of the Regex (when it is first compiled - thereafter it is fine)

I have also found another problem - when using \s+ in the regular expression on the same Windows 2008 server, the memory balloons (uses 4GB+) and the compilation of the Regex never finishes.

Is there a known issue with Regex and 64 bit .net? Is there a fix/patch available for this? I cannot really find any info on the net, but I have found a few articles about this same issues in Framework 2.0 - surely this has been fixed by now?

More info: The server is running the 64 bit version of the .net framework (3.5 SP1) and on my laptop I have Visual Studio 2008 and the 3.5 framework installed. The regular expression is of the following pattern: ^word$|^word$|^word$ and is constructed with the following flags: RegexOptions.IgnoreCase | RegexOptions.Compiled

Edit: Here is a code snippet:

StringBuilder regexString = new StringBuilder();
if (!String.IsNullOrEmpty(fileLocation))
{
    XmlTextReader textReader = new XmlTextReader(fileLocation);
    textReader.Read();
    while (textReader.Read())
    {
        textReader.MoveToElement();
        if (textReader.Name == "word")
        {
            regexString.Append("^" + textReader.GetAttribute(0) + "$|");
        }
    }
    ProfanityFilter = new Regex(regexString.ToString(0, regexString.Length - 1), RegexOptions.IgnoreCase | RegexOptions.Compiled);
}

DateTime time = DateTime.Now;
Console.WriteLine("\nIsProfane:\n" + ProfanityFilter.IsMatch("test"));
Console.WriteLine("\nTime: " + (DateTime.Now - time).TotalSeconds);
Console.ReadKey();

This results in a time of 0.21 seconds on my laptop and 23 seconds on the 2008 server. The XML file consists of 168 words in the following format:

<word text="test" />
A: 

Do you compile the code with omptimization for 64 bit? By default VS compiles for Any platform (32 & 64) and it might be a problem.

Freddy
I was unaware of that - I'll give it a try and let you know
pjmyburg
Nope, no difference unfortunately
pjmyburg
From what I can tell after reading this article (http://blogs.msdn.com/bclteam/archive/2004/11/12/256783.aspx), the first rx compilation allways take much time. I am wondering if your regex is 23 seconds slow on the WS2k8 server when you run it a second time right after the first run?
Freddy
Yes, the regular expression is only compiled once and then stored in cache for the life of the application. The second and third time the regex is used, it is very fast. However, once the application exits, this regex is destroyed and will be compiled again on the next run. Ie, the app takes 23 seconds on every independent run. This first, initial compilation takes only 0.21 seconds on my laptop.
pjmyburg
Ok. I'm out of ideas about now but maybee it would be worth to check out ngen (http://msdn.microsoft.com/en-us/library/6t9t5wcf(VS.80).aspx) that can help improve startup performance. I'm sorry I could not be of more assistance.
Freddy
Thanks for your help anyway :)
pjmyburg
+3  A: 

You can pre-compile your regexs using the Regex.CompileToAssembly method, and then you could deploy the compiled regexs to your server.

ShellShock
Yes, but that means that the non-technical administrators of the service cannot just add a word to an XML file - the DLLs would need to be recompiled every time. Good suggestion though.
pjmyburg
I think he meant that after you read in the file, you use the RegexOptions.Compiled option to optimize the execution of the regex.
brianary
No, he meant you pre-compile the Regex to a DLL file (assembly) - that is what the CompileToAssembly method does. The RegexOptions.Compiled flag is the cause of this whole issue. That is indeed the way I would want to go, but it seems there is a bug in the 64 bit .net libraries.
pjmyburg
A: 

See this link it points to a hotfix: http://csharpfeeds.com/post/5158/The%5FRegexOptions.Compiled%5FFlag%5Fand%5FSlow%5FPerformance%5Fon%5F64-Bit%5F.NET%5FFramework%5F2.0%5FJosh%5FFree.aspx

Happer
That fix is for Framework 2.0 and was apparently included in Framework 2.0 SP1 (maybe the patch was forgotten for 3.5 SP1?).
pjmyburg
+1  A: 

I found a solution, given not the correct one, but perfect in my case. For some reason if I leave out the RegexOptions.Compiled flag, the Regex is much, much faster. I even managed to execute the Regex on 100 long phrases in under 65 milliseconds on the 2008 server.

This must be a bug in the .net lib as the uncompiled version is supposed to be much slower than the compiled version. Either way, under 1 millisecond per check is very much acceptable for me :)

pjmyburg
You may also want to experiment with more alternative regex patterns to find the optimal one, such as /^(word|word|word|word)$/ instead of /^word$|^word$|^word$/ .
brianary
Yes, I am aware of that. Like I mentioned in the original question, I wrote a console application merely to highlight the problem. That exact same Regex compiles in 0.21 seconds on my laptop, so it should not need to compile for 23 seconds on a 64 bit server.
pjmyburg
Had the same issue and solution, with it set to compiled it ran fine on my local XP box, when uploaded to server was taking 40+ seconds per regex. Removed compiled option and 8 calls now take less than 1 second total.
ManiacZX
A: 

I ran into the exact same problem. My app works fine on x86 machines but memory balloons and hangs on x64. Removing the compilation flag did not help. I tried this today on .net 4.0 and the problem remains. If you have a repro, I suggest you file a bug.

I think MSFT knows about this, see the bottom comment here: https://connect.microsoft.com/VisualStudio/feedback/details/508748/memory-consumption-alot-higher-on-x64-for-xslcompiledtransform-transform-then-on-x86

But let them decide if this is the same bug. Please add a link to your filing here if you file so I can add my comments to it.

Thanks!