views:

33

answers:

1

Motivation

Currently I'm using the java parser japa to create an abstract syntax tree (AST) of a java file. With this AST I'm doing some code generation (e.g.: if there's an annotation on a method, create some other source files, ...)

Problem

When my code generation becomes more complex, I've to dive deeper into the structure of the AST (e.g. I have to use visitors to extract some type information of method parameters).

But I'm not sure if I want to stay with japa or if I will change the parser library later.

Because my code generator uses freemarker (which isn't good at automatic refactoring) I want the interface that it uses to access the AST information to be stable, even if I decide to change the java parser.

Question

What's the best way to encapsulate complex datastructures of third party libraries?

  1. I could create my own datatypes and copy the parts of the AST that I need into these.

  2. I could create lots of specialized access methods that work with the AST and create exactly the infos I need (e.g. the fully qualified return type of a method as one string, or the first template parameter of a class).

  3. I could create wrapper classes for the japa datastructures I currently need and embed the japa types inside, so that I can delegate requests to the japa types and transform the resulting japa types to my wrapper classes again.

Which solution should I take? Are there other (better) solutions to this problem?

+2  A: 

My vote is for 2 or 3, siding with 3. (Edit, reasons for not going for 1 given below.)

For (2), you could start with a ASTNode interface

interface ASTNode
{
   List<ASTNode> children();
}

You could then create subtypes for specific types of syntax node, or add generic collections to the interface for retrieving attributes.

You implement instances of this interface, e.g. JapaASTNode, which wraps the AST node from the library. The drawback with this is that you end up creating a lot of wrappers, and managing these can be tricky, e.g. mantaining the mapping from your wrappers to the underlying ASTs. Some clients may expect to get back the same wrapper for the same AST, rather than new wrappers each time the same AST is used. This can be solved using a WeakHashMap, which keeps track of which wrapper is used for each AST, without holding on to the AST for longer than necessary.

A real-world example of this pattern is found in dom4j - interfaces are used to wrap the DOM objects from different XML parsers, hiding the rest of the system from the specific DOM implementation.

The other alterantive (3), is to not wrap the objects, but provide navigation methods that know how to find related objects (e.g. child ndoes) and extract information, e.g. getting the tokens associated with the AST node.

This is done by having a Navigator interface that takes ASTs (in their original form from the library) and knows how to deal with them, e.g.

interface Navigator
{
   List<Object> getChildren(Object ast);
   List<Object> getTokens();
}

which might be implemented like this

class JapaNavigator implements Navigator
{
    List<Object> getChildren(Object ast) {
        JapaAST japaAST = (JapaAST)ast;
        return japaAST.getChildren();
    };
    // for illustration - not the actual japa api
}

Looks a little ugly with the casting, but keep in mind that it's internal.

This works well if the object model of the libraries are similar, which can reasonably be exepected to be the case for ASTs, and it saves you from having to wrap absolutely everything. The downside, is that you are working with Object all of the time externally. Though, there is little danger of using it in the wrong way - the specific implementation of Navigator will be casting these objects back to their expected AST type, and so errors are quickly discovered (e.g. passing a Object that is a token to a method expecting a AST.)

For a real-world example, see jaxen, which succesfully xpplies xpath over a number of different object models, similar to the way you would want to traverse over different ASTs.

EDIT:

While (1) will work, you will code up a lot of iterators, converters and various AST-model objects, to build your parser-independent model, which will end up being quite a bit of code. (E.g. duplicate all the nodes, duplicate all the tokens, visitors for handling different node types.) Then you'll have to duplicate most of that if you choose to move to a different parser. Of course, you could avoid duplicating the builder for different parsers by implementing a parser-neutral interface using (2) or (3) above, and using that as the basis of your copying, but that's the programming equivalent of chasing your tail!

mdma
+1: Thank you very much for your advice!
tangens