tags:

views:

65

answers:

2

This is the line mono on linux locks up (i am using 2.6.4 VM distro on the official site)

var match = Regex.Match(sz, linkPattern);

The string is this which gets the link and the title.

var linkPattern = @"<\ba\b[^\>]*\bhref\b*=\b*""([^""\>]*)""[^\>]*\btitle\b*=\b*""([^""\>]*) by [^""\>]*""";

When mono hits that line it doesnt crash, throw an exception or anything. Using tops i see mono using 96% of the CPU. I dont know how long the string is. I suspect its <8kb (i tested a different url) and it has been a few minutes since i ran the code so something must be broken.

+1  A: 

There are some bugs in Mono's regex implementation that can cause it to recurse infinitely. Probably the only fix is to rewrite your pattern to be a simpler regular expression, or not use regular expressions for this task.

You may also want to file a bug. I think there is a Google Summer of Code student currently working on Mono's regular expression engine.

jpobst
+2  A: 

"Too many \b's" was my first reaction. But really:

\b means word boundary. In my opinion, <\ba and <a should be identical. Also, \b* therefore would mean "optional repetition of word boundaries", which sounds rather confusing.

I guess I've never used \b at all, and used \s? or \s* instead.

Did you try a different regex engine (Perl, PHP) to determine whether the lockup is due to Mono?

devio
You're right, the `\b` in `<\ba` is pointless. As for `\b*`, it looks like it's supposed to be `\s*`: zero or more whitespace characters.
Alan Moore
I dont know why i wrote that, it was old code but \b* was the problem and \s* is the solution.
acidzombie24