tags:

views:

263

answers:

2

I've got a really simple ANTLR grammar that I'm trying to get working, but failing miserably at the moment. Would really appreciate some pointers on this...

root    :   (keyword|ignore)*;
keyword :    KEYWORD;
ignore  :    IGNORE;

KEYWORD : ABBRV|WORD;   

fragment WORD : ALPHA+;
fragment ALPHA : 'a'..'z'|'A'..'Z';
fragment ABBRV : WORD?('.'WORD);

IGNORE  : .{ Skip(); };

With the following test input:

"some ASP.NET and .NET stuff. that work."

I'm wanting a tree that is just a list of keyword nodes,

"some", "ASP.NET", "and", ".NET", "stuff", "that", "work"

At the moment I get

"some", "ASP.NET", "and", ".NET", "stuff. that",

(for some reason "." appears within the last keyword, and it misses "work"

If I change the ABBRV clause to

fragment ABBRV : ('.'WORD);

then that works fine, but I get keyword (asp) and keyword (.net) - seperately - but I need them as a single token.

Any help you can give would be much appreciated.

A: 

There are a couple things, first your ignore parser rule will never be triggered and does not even have to appear in this grammar (also leave out of the root rule). Of course, since you were debugging and had the ignore rule it is much easier to test (by dropping the skip(); in the IGNORE lexer rule).

Now to explain the test data, since none of the lexer tokens match just WORD '.' the ending of your test data is being ignored because of the period right after the text. If you place a space between 'work' and the period then the last word will appear and the period will not appear, this is what you want. The lexer does not know what to do with 'work.' when it ends. If you add another word at the end (put a space between the period and the new word) then 'work.' is being passed from the lexer rules as one IGNORE token. I would have thought the word would be passed and the period should be in the IGNORE token only.

WayneH
Thanks Wayne - this is probably a clear sign if my inexperience with ANTLR - but using ANTLRWorks I get NoViableAltExceptions without the "Ignore" rule?Also, the unexpected phrase "stuff. that" is somehow coming back as a keyword rather than an ignore token? And I don't understand why?thanks for your help
James Crowley
A: 

I decided to try to solve your problem with an ANTLR3 Grammar. This is what I came up with, with some strings attached:

  • Your spec does not contain many rules, and as a result, my grammar is not very thorough.
  • Consider adding to KEYW to match more tokens.
  • I don't have C# compatible ANTLR right now. Capitalize the 'skip()' to make it compatible.

    grammar TestSplitter;
    
    
    start: (KEYW DELIM!?)* ;
    KEYW: ('a'..'z'|'A'..'Z'|'.')+ ;
    DELIM: '.'? ' '+ ;
    
Kivin