views:

248

answers:

6

I need to perform static source analysis on Java code. Ideally, I want the system to work out of the box without much modification from me.

For example, I have used Antlr in the past, but I spent a lot of time building grammar files and still didn't get what I wanted.

I want to be able to parse a java file and have return the character position of say:

  1. Character position start and end of a Java block comment
  2. Character position start and end of a Java class file
  3. Character position start and end of a Java method declaration, signature, and implementation.

It looks like Antlr will do that, but I have yet to finish a grammar that actually gives me the positions of the code I need.

Does anyone have that complete Antlr grammar and Java code to give the character positions of the parts in the Java source.

+1  A: 

The Java CUP parser generator has a complete grammar of the Java Language. Have a look here: http://www.cs.princeton.edu/~appel/modern/java/CUP/

If the grammar does not turn out to provide positions, it's easily added for the constructs your interested in. From the CUP documentation:

Each symbol on the right hand side can optionally be labeled with a name. Label names appear after the symbol name separated by a colon (:). Label names must be unique within the production, and can be used within action code to refer to the value of the symbol. Along with the label, two more variables are created, which are the label plus left and the label plus right. These are int values that contain the right and left locations of what the terminal or non-terminal covers in the input file.

aioobe
A parser generator gets you at best a parser. With effort, you can build an AST. What you need are symbol tables, control and data flow analysis, inheritance and call graphs, points-to analysis... Starting fresh with just a parser generator is pretty much a guarantee you'll never get around to doing the static analysis you want. Go get a framework with all this stuff built in.
Ira Baxter
+2  A: 

Use the google javaparser, read a compilationunit, write a visitor.

http://code.google.com/p/javaparser/wiki/UsingThisParser#Visiting_class_methods

Unfortunately there are no online javadocs, but here's the source:

http://javaparser.googlecode.com/svn/trunk/JavaParser/src/japa/parser/ast/visitor/VoidVisitor.java

http://javaparser.googlecode.com/svn/trunk/JavaParser/src/japa/parser/ast/visitor/VoidVisitorAdapter.java

This library is also used by spring roo to generate the source code


EDIT:

basically you use a generic visitor that does nothing at all and override the methods for the node types you are interested in (e.g. method declaration, type declaration, block comment etc.)

seanizer
I like this option.
Berlin Brown
+4  A: 

Java6 has some facilities for this: Source Code Analysis Using Java6 APIs by Seema Richard on java.net.

Dilum Ranatunga
+1 for the good link.
fastcodejava
this is awesome!! does anybody know where I can download the compiler tree api (preferably via maven)? couldn't find a link...
seanizer
A: 

XTC contains a java 1.5 parser and you can access the location of all the nodes.

Then you can write a visitor to visit only the kind of nodes you want

LB
A: 

The DMS Software Reengineering Toolkit is a highly customizable program analysis and transformation. It has a full, robust Java front end (parser, tree-builder, name resolver, flow anlayzers, call graph extraction, etc.).

Each element of the tree is stamped with file source location precise to the line and column number. End line and end column data are also available. Comments in the source are captured and associated with tree nodes; they also have precise source position information.

You will find that after you get the grammar right, that you need all that other stuff to do serious static analysis, and that other stuff is much harder than getting the grammar right.

Ira Baxter
+1  A: 

If your main concern is static analysis, then I wouldn't care so much about the parser that is used. Parsing is the boring part, and you will probably need Java-specific information for your analysis (like class inheritance tree) that you would have to calculate yourself if you don't use a proper framework. What type of analysis do you intend to perform?

Another alternative might be Soot. Although it is aimed at optimization/manipulation of Java bytecode, of course it naturally comes with analysis support.

ShiDoiSi