views:

282

answers:

3

I'm working on a JavaScript collator/compositor implemented in Java. It works, but there has to be a better way to implement it and I think a Lexer may be the way forward, but I'm a little fuzzy.

I've developed a meta syntax for the compositor which is a subset of the JavaScript language. As far as a typical JavaScript interpreter is concerned, the compositor meta syntax is legal, just not functional (I'm using synonyms to reserved words as labels followed by code blocks which the compositor is supposed to interpret). Right now, I'm using a scanner and regex to find the meta syntax in source files, then do a shallow lexical transform based on detection of legal expressions.

There is a tight coupling between the rewritten javascript and the scanner/parser which I am not happy with, as the rewritten javascript uses features of an object support library specially written for the purpose, and that library is subject to change.

I'm hoping that I can declare just the meta syntax in Backaus-Naur or EBNF, feed it to a lexer (ANTRL?), and on the basis of meta syntax expressions detected in source files, direct the compositor to certain actions, such as prepending a required script to another, declaring a variable, generating text for a suitably parameterised library function invocation, or even compressing a script.

Is this the appropriate way to make a compositor? Should I even be using a Scanner/Parser/Lexer approach to compositing JavaScript? Any feedback appreciated- I'm not quite sure where to start :)

UPDATE: Here is a more concrete example- sample object declaration with meta syntax:

namespace: ie.ondevice
{
    use: ie.ondevice.lang.Mixin;
    use: ie.ondevice.TraitsDeclaration;

    declare: Example < Mixin | TraitsDeclaration
    {
        include: "path/to/file.extension";
        // implementation here
    }
 }

This describes the object ie.ondevice.Example, that inherits Mixin and resembles (i.e. 'implements the same functions and traits as') TraitsDeclaration. The compositor would detect use statements, and fail if the namespace does not map to a valid file location, or otherwise prepend the scripts in which object declarations reside, preprocessing meta syntax there before collation.

The rewrite rules, expressed in terms of the object support library I mentioned would result in a file that may look like this (I've developed a number of ways to express the object):

module("ie.ondevice.Example", function (mScope)
{
   // mScope is a delegate
   mScope.use("ie.ondevice.lang.Mixin");
   mScope.use("ie.ondevice.TraitsDeclaration");

   // As a result of two use statements, the mScope.localVars string would
   // would look like this: "var Mixin= ie.ondevice.lang.Mixin, TraitsDeclaration= ie.ondevice.TraitsDeclaration
   // by evaling we introduce 'imported' objects with their 'local'names
   eval(mScope.localVars); 

   // Function.prototype has been extended with the functions
   // inherits, define, defineStatic, resembles and getName

   // Prototypal inheritance using an anonymous bridge constructor
   Example.inherits(Mixin);

   // named methods and properties are added to Example.prototype
   Example.define
   (
       // functions and other properties
   );
   // ensures that Example.prototype has all the same
   // property names and types as TraitsDeclaration.prototype
   // throwing an exception if not the case.
   // This is optionally turned off for production- these
   // operations are only performed when the object is declared
   // - instantiation incurs no additional overhead
   Example.resembles(TraitsDeclaration);

   // constructor
   function Example ()
   {
       Mixin.call(this);
   };

   // will generate the ie.ondevice object hierarchy if required
   // and avail the constructor to it
   mScope.exports(Example);
 });

Perhaps I'm over architecting my requirements, but what I would really like is an event driven collator- listeners could then be loosely coupled to directive detections.

A: 

Do you want to build a lexical scanner (a.k.a. "lexer" or tokenizer) or a syntax-directed parser?

bit-twiddler
That, I'm not sure about- I'm new to the domain, but a syntax-directed parser sounds appropriate- would you be able to expand on that a little?
VLostBoy
+2  A: 

Yes, using parser generator (like ANTLR) is the way to go IMO. If you provide a more concrete example of what it is you're trying to parse, perhaps I (or someone else) can help you further.

Scott Stanchfield created a couple of good video tutorials for ANTLR starting from the very beginning.

EDIT:

Given your example:

namespace: ie.ondevice
{
    use: ie.ondevice.lang.Mixin;
    use: ie.ondevice.TraitsDeclaration;

    declare: Example < Mixin | TraitsDeclaration
    {
        include: "path/to/file.extension";
        // implementation here
    }
}

here's how a grammar (for ANTLR) could look like:

parse
    :   'namespace' ':' packageOrClass '{'
            useStatement*
            objDeclaration
        '}'
    ;

useStatement
    :    'use' ':' packageOrClass ';'
    ;

includeStatement
    :    'include' ':' StringLiteral ';'
    ;

objDeclaration
    :    'declare' ':' Identifier ( '<' packageOrClass )? ( '|' packageOrClass )* '{' 
             includeStatement* 
         '}'
    ;

packageOrClass
    :    ( Identifier ( '.' Identifier )* )
    ;

StringLiteral
    :    '"' ( '\\\\' | '\\"' | ~( '"' | '\\' ) )* '"'
    ;

Identifier
    :    ( 'a'..'z' | 'A'..'Z' | '_' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' )*    
    ;

LineComment
    :    '//' ~( '\r' | '\n' )* ( '\r'? '\n' | EOF )     
    ;

Spaces
    :    ( ' ' | '\t' | '\r' | '\n' )     
    ;

The above is called a mixed grammar (ANTLR will generate both the lexer and parser). The "rules" starting with a capital are lexer-rules and the ones starting with a lower case are parser-rules.

Now you could let the generated parser create a FJSObject (Fuzzy JavaScript Object):

class FJSObject {

    String name;
    String namespace;
    String inherit;
    List<String> use;
    List<String> include;
    List<String> resemble;

    FJSObject() {
        use = new ArrayList<String>();
        include = new ArrayList<String>();
        resemble = new ArrayList<String>();
    }

    @Override
    public String toString() {
        StringBuilder b = new StringBuilder();
        b.append("name      : ").append(name).append('\n');
        b.append("namespace : ").append(namespace).append('\n');
        b.append("inherit   : ").append(inherit).append('\n');
        b.append("resemble  : ").append(resemble).append('\n');
        b.append("use       : ").append(use).append('\n');
        b.append("include   : ").append(include);
        return b.toString();
    }
}

and while your parser is going through the token-stream, it simply "fills" FJSObject's variables. You can embed plain Java code in the grammar by wrapping { and } around it. Here's an example:

grammar FJS;

@parser::members {FJSObject obj = new FJSObject();}

parse
    :   'namespace' ':' p=packageOrClass {obj.namespace = $p.text;}
        '{'
            useStatement*
            objDeclaration
        '}'
    ;

useStatement
    :   'use' ':' p=packageOrClass {obj.use.add($p.text);} ';'
    ;

includeStatement
    :   'include' ':' s=StringLiteral {obj.include.add($s.text);} ';'
    ;

objDeclaration
    :   'declare' ':' i=Identifier {obj.name = $i.text;} 
        ( '<' p=packageOrClass {obj.inherit = $p.text;} )? 
        ( '|' p=packageOrClass {obj.resemble.add($p.text);} )* 
        '{' 
            includeStatement* 
            // ...
        '}'
    ;

packageOrClass
    :   ( Identifier ( '.' Identifier )* )
    ;

StringLiteral
    :   '"' ( '\\\\' | '\\"' | ~( '"' | '\\' ) )* '"'
    ;

Identifier
    :   ( 'a'..'z' | 'A'..'Z' | '_' ) ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' )* 
    ;

LineComment
    :   '//' ~( '\r' | '\n' )* ( '\r'? '\n' | EOF ) {skip();} // ignoring these tokens
    ;

Spaces
    :   ( ' ' | '\t' | '\r' | '\n' ) {skip();} // ignoring these tokens
    ;

Store the above in a file called FJS.g, download ANTLR and let it generate your lexer & parser like this:

java -cp antlr-3.2.jar org.antlr.Tool FJS.g

And to test it, run this:

public class ANTLRDemo {
    public static void main(String[] args) throws Exception {
        String source =
                "namespace: ie.ondevice                             \n"+
                "{                                                  \n"+
                "    use: ie.ondevice.lang.Mixin;                   \n"+
                "    use: ie.ondevice.TraitsDeclaration;            \n"+
                "                                                   \n"+
                "    declare: Example < Mixin | TraitsDeclaration   \n"+
                "    {                                              \n"+
                "        include: \"path/to/file.extension\";       \n"+
                "        // implementation here                     \n"+
                "    }                                              \n"+
                "}                                                    ";
        ANTLRStringStream in = new ANTLRStringStream(source);
        CommonTokenStream tokens = new CommonTokenStream(new FJSLexer(in));
        FJSParser parser = new FJSParser(tokens);
        parser.parse();
        System.out.println(parser.obj);
    }
} 

which should produce the following:

name      : Example
namespace : ie.ondevice
inherit   : Mixin
resemble  : [TraitsDeclaration]
use       : [ie.ondevice.lang.Mixin, ie.ondevice.TraitsDeclaration]
include   : ["path/to/file.extension"]

Now you could let the FJSObject class generate/rewrite your meta/source files. From that class, you could also do checks to see whether an included file actually exists.

HTH.

Bart Kiers
@Bart- thanks for the links- I've updated the post and would be very interested in your feedback!
VLostBoy
@Bart- I'm bowled over- thats clarified a lot for me- especially the strategy of delegating to a Java instance to do the writing for me- I look forward to testing this out. I think its question answered!
VLostBoy
No problem VLostBoy. Let me know if you need any clarification.
Bart Kiers
A: 

You may want to look into the Mozilla Rhino project -- it's a full solution for running JavaScript on the JVM, but the code to parse JavaScript code is fairly well enapsulated and can be used without the full execution functionality.

Skirwan
I had considered that as an abstract possibility, but I'm not sure that I could use Rhino to collate javascript- AFAIK Rhino can compile JavaScript to Java only- do you know how, in principle, how Rhino could be used as a programmable script collator?
VLostBoy