tags:

views:

132

answers:

2
+5  Q: 

Regex headache...

I want to validate a some C# source code for a scripting engine. I want to make sure that only System.Math class members may be referenced. I am trying to create a regular expression that will match a dot, followed by a capital letter, followed by any number of word characters, ending at a word boundry that is NOT preceded by System.Math.

I started with this:

(?<!Math)\.[A-Z]+[\w]*

Which works fine for:

return Math.Max(466.89/83.449 * 5.5);  // won’t flag this
return Xath.Max(466.89/83.449 * 5.5);  // will flag this

It correctly matches .Max when it is not preceded by Math. However, now that I'm trying to expand the regular expression to include System, I can't get it to work.

I've tried these permutations of the regular expression and more:

((?<!System\.Math)\.[A-Z]+[\w]*)
((?<!(?<!System)\.Math)\.[A-Z]+[\w]*)
((?<!System)\.(?<!Math)\.[A-Z]+[\w]*)
((?<!System)|(?<!Math)\.[A-Z]+[\w]*)
((?<!System\.Math)|(?<!Math)\.[A-Z]+[\w]*)

Using these statements:

return System.Math.Max(466.89/83.449 * 5.5);
return System.Xath.Max(466.89/83.449 * 5.5);
return Xystem.Math.Max(466.89/83.449 * 5.5);

I've tried everything that I could think of, but it either ALWAYS matches the second element (.Math or .Xath above) or it DOESN'T match ANYTHING.

If anyone would have have mercy on me and point out what I'm doing wrong, I would greatly appreaciate it.

Thanks in advance, Welton

+2  A: 

If you are just looking for what you stated in the example, this regex will do it.

^[\w\s]*?[A-Z]\w+\.[A-Z]\w+\.(?<!System\.Math\.)

It matches all calls to something OTHER than System.Math.XXX as long as: a) there are two . in the call, b) that call is on one line.

return System.Math.Max(466.89/83.449 * 5.5); // no match
return System.Xath.Max(466.89/83.449 * 5.5); // match
return Xystem.Math.Max(466.89/83.449 * 5.5); // match
System.Math.Max(466.89/83.449 * 5.5);  // no match
System.Xath.Max(466.89/83.449 * 5.5);  // match
Xystem.Math.Max(466.89/83.449 * 5.5);  // match
return System.Math.Max(466.89/83.449 * 5.5); // no match
return System.Xath.Max(466.89/83.449 * 5.5); // match
return Xystem.Math.Max(466.89/83.449 * 5.5); // match
Math.Max(466.89/83.449 * 5.5);               // no match - only one '.'
System.Max.Math(466.89/83.449 * 5.5);        // match

I agree with the comments though; Any regex is pretty fragile and should only be thought of as a text editor type help. You need a parser if you wish it to be bullet proof.

drewk
Doesn't work on return Xath.Max(466.89/83.449 * 5.5);
Richard Hein
@Richard: Did the OP state he wanted to match that ultimately? My understanding was that the `Math.Max(466.89/83.449 * 5.5);` and `Xath.Max(466.89/83.449 * 5.5);` where intermediate developments of his regex...
drewk
Not sure ... if he says it's ok, then you get a point.
Richard Hein
Oh well, I'll give you a point anyways, because my attempts have failed quite miserably. LOL. You should see what I tried ... it would be negative points for me. ;)
Richard Hein
@Richard: Thanks!
drewk
+2  A: 

The trick is to make sure you never start matching a member name anywhere but at the beginning. Then it's a simple matter of using a lookahead to find out if whatever you're looking at starts with System.Math.. Try this regex:

(?<![\w.])(?!(?:System\.)?Math\.)(?:[A-Z]\w*\.)+[A-Z]\w*\b

The lookbehind ensures that the match doesn't start in the middle of a word (\w) or the middle of a qualified member name (.). Now, if the lookahead fails it can't just jump to the beginning of the next component (e.g, the Math. in System.Math.) and try again. It's all or nothing.

However, this will match Math.Max if it's not preceded by System.. Do you really need that, or was that just an intermediate step in developing a regex for the full name?

EDIT: I went ahead and made the System. part optional.

Alan Moore
That works too!
drewk