views:

529

answers:

5

A frequent issue in code reviews is whether a numeric value should be hard-coded in the code or not. Does anyone know of a nice regular expression that can catch 'magic numbers' in code like:

int overDue = 30;
Money fee = new Money(5.25D);

without also getting a ton of false positives like for loop initialization code?

for (int i = 0; i < array.length; i++) {

}
+1  A: 

Other than using a pre-built code analysis tool, the common approach is to look for all numbers outside a certain range. For example all number larger than 5 and lower than -5. You'll find that doing this gets rid of the majority of false positives. If you want to be more aggressive you can use 3 instead of 5, but you'll get more false positives...

Stephane Grenier
+12  A: 

A better question would be about asking what tools do that. And the answer would be:

  • Checkstyle
  • FxCop

And many more static code analysis tools.

Loki
Agreed. We are using PMD and FindBugs but it doesn't seem to flag these. I'll look at Checkstyle. I was hoping to create the rule in PMD.
Brian
A: 

For Java I'd get FindBugs and then write a custom bug detector for it to do that checking you need. For more info on writing a custom bug detector see this link.

P Arrayah
A: 

Here's a simple regex I use to scan for magic numbers in a large PHP project:

[^'"\w]-[1-9]\d*[^'"\w]

This will include any number != 0 that's not surrounded by single or double quotes or letters. Tweak for your own needs as desired.

A: 

The SD Source Code Search Engine is an interactive tool for searching source code for many languages (C, C++, C#, Java, PHP, COBOL, FORTRAN, Python, ...). It understands the lexical syntax of each language at the same level of detail as the corresponding language compiler, so it knows and can distinguish easily keywords, identifiers, numbers, operators, punctuation and whitespace.

The Search Engine can be given queries in terms of these entities and constraints on their values and will search the code for all matches, will display the matches and then will let you inspect source code for each match with a single mouse click. Because it understands lexical syntax, it isn't fooled by comments, whitespace or content.

For example, you can find all identifiers containing the letters TAX by writing a widlcarded identifier (I) search:

I=*TAX*

You can find all numbers in a file greater than 50 and less than 72:

N>50<72

and it will find them regardless of radix or syntax, because it knows the langauge syntax.

You can find all the for loops with an upper bound of 50 or more:

'for' ...  I '<' N>50

If you want to simply find all the constants in the conde, just write an unconstrained search for numbers:

N

A logging facility can write all the hits to an XML file for later processing if you like.

Ira Baxter