tags:

views:

57

answers:

2

I'm trying to write a grammar for various time formats (12:30, 0945, 1:30-2:45, ...) using ANTLR. So far it works like a charm as long as I don't type in characters that haven't been defined in the grammar file.

I'm using the following JUnit test for example:

    final CharStream stream = new ANTLRStringStream("12:40-1300,15:123-18:59");
    final TimeGrammarLexer lexer = new TimeGrammarLexer(stream);
    final CommonTokenStream tokenStream = new CommonTokenStream(lexer);
    final TimeGrammarParser parser = new TimeGrammarParser(tokenStream);

    try {
        final timeGrammar_return tree = parser.timeGrammar();
        fail();
    } catch (final Exception e) {
        assertNotNull(e);
    }

An Exception gets thrown (as expected) because "15:123" isn't valid. If I try ("15:23a") though, no exception gets thrown and ANTLR treats it like a valid input.

Now if I define characters in my grammar, ANTLR seems to notice them and I once again get the exception I want:

  CHAR: ('a'..'z')|('A'..'Z');

But how do I exclude umlauts, symbols and other stuff a user is able to type in (äöü{%&<>!). So basically I'm looking for some kind of syntax that says: match everything BUT "0..9,:-"

A: 

you can define a literal, that matches all the characters, that you do not want. If this literal is not contained in any of your rules, antlr will throw a NonViableException.

For unicode this could look like this:

 UTF8 :  ('\u0000'..'\u002A'     // ! to * 
     | '\u002E'..'\u002F'           // . / 
     | '\u003B'..'\u00FF'           // ; < = > ? @ as well as letters brackets and stuff
     ) 
     ;
nebenmir
A: 

...
So basically I'm looking for some kind of syntax that says: match everything BUT "0..9,:-"

The following rule matches any single character except a digit, ,, : and -:

Foo
  :  ~('0'..'9' | ',' | ':' | '-')
  ;

(the ~ negates single characters inside lexer-rules)

But you might want to post your entire grammar: I get the impression there are some other things you're not doing as they should have been done. Your call.

Bart Kiers