ansaurus

Question

MismatchedTokenException in HTML subset grammar

Answer 1

+1 A:

I recommend not testing your grammar with ANTLRWorks: error messages are easily missed in the console and it might therefor interpret your test input not as you expect. Do it with a custom created class like this:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("<div level_0>This is some random text</div>");
        TestLexer lexer = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokens);
        Sparser.parse());
    }
}

Now, the following rule is not correct:

TEXT
  :  (. | '\r' | '\n')
  ;

The . already matches both \r and \n, so it should be:

TEXT
  :  .
  ;

When changing that, you can create a parser & lexter, compile all .java files and run the Main class:

java -cp antlr-3.2.jar org.antlr.Tool Test.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar Main

which will produce the following error:

line 1:15 mismatched input 'i' expecting '</'

because the i from This is being tokenized by the rule I : ('i' | 'I') ;.

There are more problems with your current approach:

HTML_ATTRIBUTES does too much: you should instead have ATTRIBUTE, = and VALUE rules and then move the plural (html attributes) to your parser instead;
now your attributes cannot contain < and > which is incorrect (the can contain them, although it is not recommend).

I'd start over if I were you. If you want, I'm willing to propose a start: just says so.

Bart Kiers 2010-06-23 08:30:59

Thanks, it seems I have misunderstood some of the fundamentals when it comes to priority rules. Back to the reference I go! Also, thank you for your offer, but I guess I need to cover the basics a bit better before starting over.

ASV 2010-06-23 10:14:29

@ASV, sure, no problem.

Bart Kiers 2010-06-23 10:27:07

ansaurus

tags:

views:

answers:

MismatchedTokenException in HTML subset grammar

related questions