views:

40

answers:

2

I want to do a static code analysis on a bunch of scripts written in a not very common programming language (C like syntax). Frequent problems are:

  • the use of not defined/declared symbols
  • wrong number or type of arguments when calling a function

The language interpreter/compiler itself does not provide aid for these problems.

Is there any lint like tool that is flexible enough to adapt it easily to new programming languages? Or does someone know another good starting point? (Lex/Yacc ???)

Thanks in advance

+1  A: 

I doubt you're going to find an all-purpose tool.

Much of static analysis depends on far more than lexical and grammatical compliance.

A good static analyzer is going to have extra-contextual knowledge of the language and its implementation. It may also include a simulator that keeps track of state and multiple execution paths. Additionally, it may be aware of patterns and practices, as well as certain libraries and calls.

For instance, in C, this code if ( x = 3 ) { /*Do something*/ } is perfectly legal, although the programmer may have intended ==. Or, one might do printf("%s", longVal);, and while arbitrary values can be shoved on the stack, that specific call may have other expectations based on initial values passed to it.

Bottom line, there'd be so much for a generic lint application to know, not to mention that languages and libraries are a moving target, that if such a beast did exist it'd be either way too complicated or way too underpowered for practical application than a cheaper tool that did a language-specific job better.

Walt Stoneburner
In the you're-on-your-own department, you might look at translating the not-very-common language into a common language, perhaps through automation or a cross-compiler, and then lint the output of that. Though, I suspect you'd only get affirmation the translation was properly implemented, not that the translation or desired functionality followed.
Walt Stoneburner
Translating langauges is hard, and includes building parsers and analyzers for the original language. It would silly to go through all the the trouble to build a translator, when having the parser and analyzers machinery would be enought to build a custom linter. See my bio for my experience.
Ira Baxter
+1  A: 

The commercially available DMS Software Reengineering Toolkit allows to write such consistency checks and is flexible enough to be adapted to many languages.

Pascal Cuoq
... it has very strong support for defining "new" languages. It goes far beyond lex/yacc, by providing support for building symbol tables (useful for diagnosing your undefined/not-used symbol analysis), attribute evaluation to enable easy computation of metrics for complexity, and explicit pattern matching expressed in the syntax of the desired target languge (useful in writing patterns to check for wrong-number of arguments to known APIs).
Ira Baxter