views:

68

answers:

1

I've been using the ANTLR supplied ECMAScript grammar with the objective of identifying JavaScript global variables. An AST is produced and I'm now wondering what the based way of filtering out the global variable declarations is.

I'm interested in looking for all of the outermost "variableDeclaration" tokens in my AST; the actual how-to-do-this is eluding me though. Here's my set up code so far:

String input = "var a, b; var c;";
CharStream cs = new ANTLRStringStream(input);

JavaScriptLexer lexer = new JavaScriptLexer(cs);

CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);

JavaScriptParser parser = new JavaScriptParser(tokens);

program_return programReturn = parser.program();

Being new to ANTLR can anyone offer any pointers?

+1  A: 

I guess you're using this grammar.

Although that grammar suggests a proper AST is created, this is not the case. It uses some inline operators to exclude certain tokens from the parse-tree, but it never creates any roots for the tree, resulting in a completely flat parse tree. From this, you can't get all global vars in a reasonable way.

You'll need to adjust the grammar slightly:

Add the following under the options { ... } at the top of the grammar file:

tokens
{
  VARIABLE;
  FUNCTION;
}

Now replace the following rules: functionDeclaration, functionExpression and variableDeclaration with these:

functionDeclaration
  :  'function' LT* Identifier LT* formalParameterList LT* functionBody 
     -> ^(FUNCTION Identifier formalParameterList functionBody)
  ;

functionExpression
  :  'function' LT* Identifier? LT* formalParameterList LT* functionBody 
     -> ^(FUNCTION Identifier? formalParameterList functionBody)
  ;

variableDeclaration
  :  Identifier LT* initialiser? 
     -> ^(VARIABLE Identifier initialiser?)
  ;

Now a more suitable tree is generated. If you now parse the source:

var a = 1; function foo() { var b = 2; } var c = 3;

the following tree is generated:

alt text

All you now have to do is iterate over the children of the root of your tree and when you stumble upon a VARIABLE token, you know it's a "global" since all other variables will be under FUNCTION nodes.

Here's how to do that:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "var a = 1; function foo() { var b = 2; } var c = 3;";
        ANTLRStringStream in = new ANTLRStringStream(source);
        JavaScriptLexer lexer = new JavaScriptLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        JavaScriptParser parser = new JavaScriptParser(tokens);
        JavaScriptParser.program_return returnValue = parser.program();
        CommonTree tree = (CommonTree)returnValue.getTree();
        for(Object o : tree.getChildren()) {
            CommonTree child = (CommonTree)o;
            if(child.getType() == JavaScriptParser.VARIABLE) {
                System.out.println("Found a global var: "+child.getChild(0));
            }
        }
    }
}

which produces the following output:

Found a global var: a
Found a global var: c
Bart Kiers
+1 and answered. Thanks so much for the comprehensive reply.
Christopher Hunt
@Christopher, you're welcome.
Bart Kiers