I am trying to develop a mini DSL for software configuration, using antlworks for prototyping. A typical source would look like:
name: myname;
value: myvalue;
flag debug {
value = debugvalue;
}
if flag(debug) {
libname = foo_d;
} else {
libname = foo;
}
Now, I never got a formal course on parsing, so I am doing all this by trial/error from antlworks and some basics on BNF grammars. One constant problem I encounter is whitespace and newline handling. I defined something like
program: statement* EOF;
statement: compound_statement | selection_statement | field_statement;
selection_statement: 'if' expr statement;
statement_list: (WS* statement)+;
compound_statement: '{' statement_list? '}';
field_statement: name_statement | value_statement;
name_statement: 'name' WS* ':' WS* WORD WS* ';';
value_statement: 'value' WS* ':' WS* WORD WS* ';';
// Tokens
WS : (' ' | '\t' | '\n');
WORD: ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
But the whitespace handling is very buggy, it breaks for all kind of cases. What it the standard way of doing this ? Is there any resource to learn this kind of things quickly (something like building a calculator with conditional and variables in antlr - the antlr grammars I found are either trivial and full-fledge languages).