views:

44

answers:

1

Can you recommend some library that presents Java bytecode as Eclipse's Java AST (ie. with nodes descending from org.eclipse.jdt.core.dom.ASTNode) ?

+1  A: 

You're looking for a decompiler, and a way to turn that tool's AST representation into Eclipse's AST representation.

Java Decompiler has an Eclipse plugin to re-create source code. You could parse that with Eclipse's ASTParser to get the AST you want (or maybe you just wanted the source code to begin with).

The biggegst roadblock you'll run into is when the decompiler cannot accurately re-create original source code syntax (either because of obfuscation or because it simply cannot understand a bytecode construct). What sourcecode does it generate? How then does the Eclipse ASTParser handle that? YMMV.

There are many decompilers for Java, all with varying abilities depending on the compiler/obfuscator which created the bytecode it is attempting to decompile. If Java Decompiler doesn't work for you, you might create an eclipse plugin for one of the others.

Chadwick
I'm not interested in source-code per se. The situation is that I have a program analysis tool that runs on AST and I would like it to work also on classes without source code. I suppose the "vocabulary" consisting of Eclipse's ASTNode descendants should be sufficent to represent bytecode constructs more-or-less directly. If not, then probaly I should follow your advice and add the decompilation step.
Aivar
It doesn't really make sense. AST means Abstract Syntax Tree, and by the time you get to byte code the one thing you have definitely lost is the Java syntax. You would have ton define a whole new set of nodes for byte code and then adapt your tool to recognize those nodes.
EJP
@EJP converting bytecode to a source AST is a complex process of pattern recognition, flow analysis, and other tricks. Once an AST is created, source code is dumped from that. Even the best decompilers can't reproduce all source exactly like the original (multiple ways to program the same loops, whitespace differences, etc...) but the correct decompilers will produce sourcecode which can then be compiled to the same bytecode the decompiler started with.
Chadwick
Exactly. He needs either a decompiler to get back to the source, a decompiler that can produce the same AST from the bytecode, or an AST tree that corresponds directly to the byte code.
EJP
For program analysis purpose I don't need the same AST that original Java program had -- an AST with same semantics would do, I don't mind if a complex expression becomes series of simple expressions assigned to temporary variables. I guess it should be quite straightforward to represent bytecodes in Java terminology - arithmetics operations map to Assignment + InfixExpression, method calls map to MethodInvation's and so on. Jumps could theoretically cause problems (actually I'm not sure, if verifier allows such tricky jumps that can't be translated to if statement)
Aivar