tags:

views:

630

answers:

1

I am trying to develop a mini DSL for software configuration, using antlworks for prototyping. A typical source would look like:

name: myname;
value: myvalue;
flag debug {
   value = debugvalue;
}
if flag(debug) {
   libname = foo_d;
} else {
   libname = foo;
}

Now, I never got a formal course on parsing, so I am doing all this by trial/error from antlworks and some basics on BNF grammars. One constant problem I encounter is whitespace and newline handling. I defined something like

program:    statement* EOF;

statement: compound_statement | selection_statement | field_statement;
selection_statement:    'if' expr statement;
statement_list: (WS* statement)+;
compound_statement: '{' statement_list? '}';
field_statement: name_statement | value_statement;
name_statement: 'name' WS* ':' WS* WORD WS* ';';
value_statement: 'value' WS* ':' WS* WORD WS* ';';

// Tokens
WS  : (' ' | '\t' | '\n');
WORD:   ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;

But the whitespace handling is very buggy, it breaks for all kind of cases. What it the standard way of doing this ? Is there any resource to learn this kind of things quickly (something like building a calculator with conditional and variables in antlr - the antlr grammars I found are either trivial and full-fledge languages).

+2  A: 

Usually, you would do this by adding

{ $channel=HIDDEN; }

action to the WS rule; see this page, section Lexer rules for details.

jpalecek
thank you. I could swear I tried this without success, but with the documentation, I could make it work as intended.
David Cournapeau
also, once you do that you shouldn't need to include WS in your parser rules.
Ted Elliott