views:

40

answers:

1

I have the following code written in ANTLRWorks 1.4

grammar hmm;

s           :   (put_a_in_b)|(put_out_a)|(drop_kick)|(drop_a)|(put_on_a);

put_a_in_b  :   (PUT_SYN)(ID)(IN_SYN)(ID);  
put_out_a   :   (PUT2_SYN)(OUT_SYN)(ID) | (E1)(ID); 
drop_kick   :   ('drop')('kick')(ID);
drop_a      :   (DROP_SYN)(ID);
put_on_a    :   (E2)(ID);

PUT_SYN     :   'put' | 'place' | 'drop';
PUT2_SYN    :   'put' | 'douse';
IN_SYN      :   'in' | 'into' | 'inside' | 'within';    
OUT_SYN     :   'out';
E1          :   'extinguish'|'douse';
DROP_SYN    :   'drop' | 'throw' | 'relinquish';
WS          :   ( ' '  | '\t' | '\r' | '\n' ) {$channel=HIDDEN;};
ID          :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
E2          :   'put on'|'don'|'wear';
COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

When I run it with the input:

drop object

I get a MismatchedTokenException(5 != 15).

And with the input :

put o1 in o2

I get a NoViableAltException.

Though it runs fine with

place o2 in o2

I'm new to this, but it seems like there's ambiguities? Or maybe my usage of ANTLR is incorrect?

+2  A: 

You've put 'drop' and 'put' in two different lexer-rules:

PUT_SYN  : 'put' | 'place' | 'drop';          // drop & put
PUT2_SYN : 'put' | 'douse';                   //        put
...
DROP_SYN : 'drop' | 'throw' | 'relinquish';   // drop

When put is encountered by the lexer, PUT_SYN will always be the rule that matches it, so 'put' could (or should) be removed from the PUT2_SYN rule.

So, your problem with parsing the string drop object: the parser will try to match drop_a : (DROP_SYN)(ID); but the "drop" will be matched in the lexer rule PUT_SYN.

EDIT

Those synonym-lists can be better made into parser rules (instead of lexer-rules). Here's a small demo:

grammar TextAdventure;

parse
  :  command (EndCommand command)* EOF
  ;

command
  :  put_syn_1 OtherWord in_syn OtherWord
  |  put_syn_2 out_syn_1 OtherWord
  |  out_syn_2 OtherWord
  |  Drop Kick OtherWord
  |  drop_syn OtherWord
  ;

drop_syn
  :  Drop
  |  Throw 
  |  Relinquish
  ;

in_syn
  :  In
  |  Into
  |  Inside
  |  Within
  ; 

put_syn_1
  :  Put
  |  Place
  |  Drop
  ;

put_syn_2
  :  Put
  |  Douse
  ;

out_syn_1
  :  Out
  ;

out_syn_2
  :  Extinguish
  |  Douse
  ;

Space      : (' ' | '\t' | '\r' | '\n'){$channel=HIDDEN;};
EndCommand : ';';
Put        : 'put';
Place      : 'place';
Drop       : 'drop';
Douse      : 'douse';
In         : 'in';
Into       : 'into';
Inside     : 'inside';
Within     : 'within';    
Out        : 'out';
Extinguish : 'extinguish';
Throw      : 'throw';
Relinquish : 'relinquish';
Kick       : 'kick';
OtherWord  : ('a'..'z' | 'A'..'Z')+;

When interpreting the following source:

drop object ; put yourself in myshoes ; place it in avase

you'll see ANTLRWorks generate the following parse-tree:

alt text

Bart Kiers
Sounds plausible...what is the workaround - to get the various alternatives? Use a non-terminal for 'drop' (and another for 'put') and then build the alternatives using that non-terminal?
Jonathan Leffler
Thanks for the explanation, Bart. I too am wondering about a workaround.
Rao
The solution is to factor out that commonality and put the keyword `put` into its own rule. Something like `PUT_SYN: 'put' (PUT_CMD); PUT_CMD: (ID) ...|(OUT_SYN) ...;` That is just an example of what I mean by 'factoring' out.
linuxuser27
@Rao, @Jonathan, the fix/workaround is what @linuxuser27 mentioned.
Bart Kiers
@Rao (or @rikki), perhaps you'd like to explain what kind of language you're trying to parse because I see quite a few odd things in your grammar that might need fixing.
Bart Kiers
I'm penning down ideas for creating a text-adventure game creator. The user will be made to input the various possible commands that he wishes to allow in the game. These commands may have aliases. My question reflects the subtleties faced when different commands may have the same "words" or "tokens". I was planning to use a lot of C# Dictionary<T>'s, but decided to play around with ANTLR both for prototyping and _perhaps_ as a library for implementing the project.
Rao
@Rao, see my edit.
Bart Kiers
Wow, thanks! Works perfectly! I'll spend some time dwelling over it now. At present, I think this approach is good for my project. (I'm only sorry I can't 'accept' your answer, differing accounts and all.)
Rao
@Rao, you're welcome, and no worries about not being able to accept my answer.
Bart Kiers