views:

368

answers:

3

Just started with JavaCC. But I have a strange behaviour with it. I want to verify input int the form of tokens (letters and numbers) wich are concatenated with signs (+, -, /) and wich can contain parenthesis. I hope that was understandable :)

In the main method is a string, which should produce an error, because it has one opening but two closing parenthesis, but I do not get a parse exception --> Why?

Does anybody have a clue why I don't get the exception?

I was struggling with left recursion and choice conflicts with my initial try, but managed to get over them. Maybe there I introduced the problem?!

Oh - and maybe my solution is not very good - ignore this fact... or better, give some advice ;-)

File: CodeParser.jj

 options {
   STATIC=false;
 }

 PARSER_BEGIN(CodeParser)

 package com.testing;

 import java.io.StringReader;
 import java.io.Reader;

 public class CodeParser {

     public CodeParser(String s) 
     {
         this((Reader)(new StringReader(s))); 

     }

     public static void main(String args[])
     {
         try
         {
               /** String has one open, but two closing parenthesis --> should produce parse error */
               String s = "A+BC+-(2XXL+A/-B))";
               CodeParser parser = new CodeParser(s);
               parser.expression();
         }
         catch(Exception e)
         {
               e.printStackTrace();
         }
     }
 }
 PARSER_END(CodeParser)

 TOKEN:
 {
  <code : ("-")?(["A"-"Z", "0"-"9"])+ >
  | <op : ("+"|"/") >
  | <not : ("-") >
  | <lparenthesis : ("(") >
  | <rparenthesis : (")") >
 }

 void expression() :
 {
 }
 {
  negated_expression() | parenthesis_expression() | LOOKAHEAD(2) operator_expression() | <code>
 }

 void negated_expression() :
 {
 }
 {
       <not>parenthesis_expression()
 }

 void parenthesis_expression() :
 {
 }
 {
        <lparenthesis>expression()<rparenthesis>
 }

 void operator_expression() :
 {
 }
 {
       <code><op>expression()
 }

Edit - 11/16/2009

Now I gave ANTLR a try.

I changed some terms to better match my problem domain. I came up with the following code (using the answers on this site), which seems to do the work now:

grammar Code;

CODE    : ('A'..'Z'|'0'..'9')+;
OP  : '+'|'/';

start   : terms EOF;
terms   : term (OP term)*;
term    : '-'? CODE
    | '-'? '(' terms ')';

And by the way... ANTLRWORKS is a great tool for debugging/visualizing! Helped me a lot.

Additional info
Above code matches stuff like:

(-Z19+-Z07+((FV+((M005+(M272/M276))/((M278/M273/M642)+-M005)))/(FW+(M005+(M273/M278/M642)))))+(-Z19+-Z07+((FV+((M005+(M272/M276))/((M278/M273/M642/M651)+-M005)))/(FW+(M0))))
A: 

From the Java CC FAQ:

4.7 I added a LOOKAHEAD specification and the warning went away; does that mean I fixed the problem?

No. JavaCC will not report choice conflict warnings if you use a LOOKAHEAD specification. The absence of a warning doesn't mean that you've solved the problem correctly, it just means that you added a LOOKAHEAD specification.

I would start by trying to get rid of the conflict without using a lookahead first.

rsp
Isn't the Lookahead for deciding which "thing" to use? <code> and <code><op>expression() both start with the same token. So the lookahead is only to choose the right "thing"... is the thing called 'production'?! :)
Kai
Using lookahead means you don't get the error anymore, not that it is actually fixed. Could you use `<code> (<op> expression() )?` to eliminate the need for a lookahead?
rsp
In that case, there are still two productions starting with <code>. Without the lookahead JavaCC can still not decide wich one to choose, without reading one more token. Or am I getting you wrong and you meant something else? Regards - Kai
Kai
after `<code>` is consumed the parser looks at the next token and if its `<op>` it will also consume expression(). No lookahead is needed as the `<op>` is interpreted when the `<code>` is already used. With `( xxx )?` I meant "optional xxx".
rsp
But consuming the <code> is the primary problem. For consuming <code>, the parser has to decide if he's choosing <code> from expression() or operator_expression().It should be the problem that is described on slide 11 here: http://www.cs.sjsu.edu/~mak/lectures/CS153-091029.pptApart from that, I tried your solution to maybe optimize the whole thing... to go another way... but I got stucked. Right now I'm taking a look on ANTLR. JavaCC works, except for the problem described above. That's why I give ANTLR a try. But JavaCC looked more comfortable to use :)
Kai
I was thinking about dropping the whole `| <code>` from `expression()` in favor of `<code> (<op> expression() )?` in `operator_expression()` which could be renamed into `code_expression()` in that case. Good luck on using Antlr.
rsp
+1  A: 

The problem is that you don't get the error when using the parser, correct? Not that the parser generator is claiming that the grammar is incorrect (which seems to be the discussion in the other answer).

If that's the case, then I suspect that you're seeing the problem because the parser properly matches the expression production, then ignores subsequent input. I haven't used JavaCC for a long time, but iirc it didn't throw an error for not reaching end-of-stream.

Most grammars have an explicit top-level production to match the entire file, looking something like this (I'm sure the syntax is wrong, as I said, it's been a long time):

input : ( expression ) *

Or, there's probably an EOF token that you can use, if you want to process just a single expression.

kdgregory
+1  A: 
tomcopeland
I gave ANTLR a try and had the same problem. With ANTLR I tried this solution (adding EOF). That solved the problem :-)Great! Thank you all!
Kai