views:

128

answers:

4

I am interested in static analysis tools that are out there. Or rather the API's that are supported to allow me to write my own tools using these API's. I've written dozens over the years at my present employment that scrutinize our source code (C++) for various things. But one thing I want to know is if there are other static analysis API's that are available. So

My question are

  1. What static analysis API's do you use?
  2. Why do you use it?
  3. Name one thing you have written with it?

As for me, my answers are:

What: I use an API for understand 4 c++.

Why: I use it because:

  1. The C API for it is one header file (Very small)
  2. The C API requires almost no memory management
  3. I wrote a managed wrapper around it so I can use c# with it!
  4. The API is very small but powerful in finding various things.

One Tool: Well, last week I wrote a tool to take a virtual function on a base class and then to change the accessibility on it and all virtual overrides on derived classes. This would have taken me a week to do by hand. Using the tool which took me a very short time to write I was able to change almost a thousand files with one push of a button. Cool

Note: I've also played around with the C++ code model that is available with Visual studio and have been successful in writing macros to target that.

Thanks, and I look forward to any answers you may have.

+4  A: 

clang attempts to provide a useful set of libraries for static analysis of the languages it supports. Unfortunately, although its C support is pretty good, its C++ support is currently pretty incomplete.

Why use it? It's a full-blown compiler, so you can get full visibility into the code you're working with. The APIs are (at least mostly) pretty nicely designed C++.

I haven't written anything particularly serious with it yet. I'm currently working on a tool that uses the Index library to find headers that are included but never referenced, but it's not finished yet (and may never be -- I only really intended it as an excuse to do some exploring, not really a useful tool).

Jerry Coffin
That looks interesting. I'll have to play with it. Care to modify your answer to finish the rest of the questions?
C Johnson
I would think that finding headers that are include unnecessarily is an important task. My build times for my work are up to 1 hour 55 minutes now. I would love to have those build times reduced by work such as that.
C Johnson
Well, there's eliminating useless include files, and there's useless include file content. Using DMS (see other answer) on large C systems (25M lines), we found that 90%+ of the content of all include files averaged over many compilation units are definitions that aren't used by a compilation unit. (Different compilation units may use a different 90% of the same include file). So the real problem appears to be fragmenting include files into pieces so that the rarely used stuff isn't included. We haven't explored that option.
Ira Baxter
Ah, the dangers of massively large header files.We have header files that are routinely over 10,000 lines long. Only idiots spam the code like that.I wrote a shredder app that shreds an API into the smallest units possible: one header per class, one header per definition for functions... etc.. It also re-hooked include dependencies so the 'new' API would compile. It was supposed to do reduce the unnecessary inclusion of stuff that's not needed. I didn't get to play with it more since 'management' deemed it not necessary that release.
C Johnson
+2  A: 

Our DMS Software Reengineering Toolkit is commercially available, general purpose machinery for parsing/analyzing/transforming source code for many languages, including C, C++, C#, Java, COBOL, ...

It uses explicit langauge definitions (e.g., BNF) to drive parsing machinery to build ASTs directly; DMS supports multiple dialects for some languages. There are built in analyzers to support symbol table construction, control and data flow anlaysis, points-to analysis, symbolic range analysis ...

For C, Java and COBOL, the built-in analysis machinery is tied to the language definitions so that you can use these analyzers as a foundation for a custom analysis you might want to build. C++ does have the symbol tables but isn't yet tied to the other internal analyzers, but the machinery is there.

DMS also provides procedural and source-to-source transformations, conditioned by analysis results, on top of all of this; the modified ASTs can be prettyprinted to regenerate compilable source complete with the original comments.

Your three questions:

1.What static analysis API's do you use?

  • DMS + the APIs I've described above.
  • You can use the transformational aspect to get dynamic analysis.

2.Why do you use it?

  • Mostly to support custom tool construction. Its amazing how many different questions people have about code, and how many ways they want to reshape a large application.

3.Name one thing you have written with it?

  • B-2 Stealth Bomber JOVIAL-to-C translator (seriously, see website).
  • IBM Mainframe application architecture extraction.
  • Automated C++ component restructuring.
  • Clone Detection.
  • Test Coverage and Profilers
  • Smart Differencer
  • (See website for longer more detailed list)
Ira Baxter
Given C++ awkward syntax I guess the BNF definition is quite messy, isn't it ?
Matthieu M.
@Matthieu: The C++ grammar follows the definition from the ANSI manual pretty closely, modulo adjustments for various dialects (MS <> GCC <> ...) and our special treatement of preprocessor directives. You can argue the ANSI definition is messy but, languages are what languages are. It doesn't seem materially much worse than the definitions for C# or Java at the BNF level. Where C++ is truly atrocious is the logic for doing name and type resolution, e.g., building symbol tables that accurately implement Koenig lookup. Half the value of DMS is having this stuff already completed.
Ira Baxter
Thanks for the answer. I'm going to have to check this out.
C Johnson
@Ira Baxter: Thanks for your answer, and congratulations for successfully parsing C++ ;)
Matthieu M.
+1  A: 

NDepend doesn't come (yet) with an API but is very flexible thanks to the CQL (Code Query Language) that let the user quickly write its own static analysis rule.

Patrick Smacchia - NDepend dev
This is a pretty cool tool. Thanks for the link.
C Johnson
+2  A: 

Our tool, named CodeSonar, is a commercial advanced static analysis tool for C/C++ programs. It offers several APIs that can be used to extend its functionality. Note that it is designed for doing analysis, not for doing program transformations.

There are APIs (in both C and Scheme) that allow access to the program's ASTs (which comprise symbol tables), the CFGs for each subprogram, the whole-program call graph, compilation units, include files, etc. All these representations are cross-associated with position information, so it is possible to get back to the line of code responsible.

The analysis engine visits all of these data structures, and a user can write a checker by specifying a callback to be invoked during the visit.

CodeSonar is a path-sensitive analysis tool. Path exploration is hard because some paths are infeasible and excluding those from consideration takes some effort. It is important to exclude infeasible paths to keep false positives low. CodeSonar allows users to piggyback on its path exploration, again using a visitor pattern, which allows them to write path-sensitive checkers without having to implement feasible-path exploration themselves.

This mechanism has been used to implement a checker that finds deviations from a fairly complex error reporting idiom.

Another way to write checks is to use a different special-purpose API whose purpose is not to be executed, but to educate the analysis engine about properties of the program. Roughly speaking you can use this API to write code that is similar to what you would write for a dynamic check for the property, but which is instead "interpreted" by the symbolic execution engine. You can decorate your own code with calls to this API, or keep it all off to the side.

Many of CodeSonar's built-in checkers for API usage are specified exactly this way.

Writing checks is only half the battle. Once you have a checker in production you need a way to manage what it finds. All of the mechanisms described above generate reports that populate a database, and there is a web-client based UI for looking at the results, attaching notes, integrating with other tools, etc.

I hope this helps!

Paul Anderson
It sounds like this app, and API is used for writing static analysis tools that simulate path coverage and such? I use an API that doesn't do any of that, but simply tells me, who calls something, where it was called, how many members a class has, and their types etc... Will CodeSonar do that? (err... quickly? :) )
C Johnson
Sounds like your tool does something like Coverity?
C Johnson
Yes, CodeSonar's API does give you access to all that information. The only caveat is that if you need to consider indirect calls (either through function pointers or virtual functions), then to get a complete call graph you need to do a whole-program alias analysis. We do have an option for that, but such algorithms are slow and imprecise by nature, especially if expected to be sound.
Paul Anderson
And yes, our tool is very similar to Coverity Prevent in many respects. They are certainly our biggest competitor.
Paul Anderson