views:

299

answers:

4

I'm working on a domain-specific language (DSL) for non-programmers. Non-programmers make a lot of grammar mistakes: they misspell keywords, they don't close parentheses, they don't terminate blocks, etc.

I'm using ANTLR to generate my parser; it provides a nifty mechanism for handling RecognitionExceptions to improve error handling. But I'm finding it pretty hard to develop good error-handling code for my DSL.

At this point, I'm considering ways to simplify the language to make it easier for me to provide users with high-quality error messages, but I'm not really sure how to go about this. I think I want to reduce the ambiguity of errors somehow, but I'm not sure how to implement that idea in a grammar.

In what ways can I simplify my language to improve parse-error messages for my users?

EDIT: Updated to clarify that I'm interested in ways to simplify my language, not just ANTLR error-handling tips in general. (Though, thanks for those!)

A: 

I read an article recently about someone who implemented a simple learning mechanism for his parser. Basically, the idea is to tag the parse errors that ANTLR gives you with the actual cause of the error. For example,

Error: No method "bar" for NilClass: foo

could be tagged as:

Error: Tried to call "bar" on foo, but foo didn't have a value.

The idea actually came from a 2003 paper: Generating LR Syntax Error Messages from Examples. It has also been discussed at the research!rsc blog.

perimosocordiae
is it by any chance http://research.swtch.com/2010/01/generating-good-syntax-errors.html which appeared on reddit just recently?
a_m0d
Yes, thanks! That was really bugging me.
perimosocordiae
+1  A: 

You probably hit the hardest part of using a parser generator when compared to a hand rolled grammar.

From my experience the first thing you'll want to do is to make sure you accurately track the line and column information so that you can point the user to the exact spot where the parser thinks the error is.

That should take care of 90% of the problems for users, ie missing commas or semi colons at the end of a line.

It's the other 10% is where the trouble is.

I normally start by providing a meaningful name to my lexical and grammar tokens using the paraphrase keyword.

ie

SEMI
options {paraphrase="end of line terminator";}
: ';'
;

ifExpr
options {paraphrase="boolean expression";}
   : expr 
;

Antlr will use these phrases in any error message that it generates.

Have a look at this page: http://www.antlr2.org/doc/err.html to see how the experts recommend you do it with Antlr 2 and then skim this page: http://www.antlr.org/blog/antlr3/error.handling.tml to see the changes that Antlr 3 has made. (The Antlr2 page is probably the best place to start).

chollida
+4  A: 

I wrote an article on recovering line and column numbers in ANTLR errors a couple years ago that might be helpful.

http://tech.puredanger.com/2007/02/01/recovering-line-and-column-numbers-in-your-antlr-ast/

Alex Miller
Thanks. I've updated the question to clarify that I'm specifically looking for ways to simplify my language, not just ANTLR error-handling tips in general.
Dan Fabulich
A: 

Okay, I've never used ANTLR so far, only JavaCC. But since you are going to implement a DSL and care about usability you should take a look at xtext. It's a framework that

  • lets you specify a textual grammar for your DSL in EBNF notation
  • generates a parser for you
  • generates an editor with syntax highlighting and immediate feedback on syntactic errors as an Eclipse plugin
  • gives you access to the underlying AST to transform the textual representation that your users create into anything

I attended a presentation by itemis last year, a German company that specializes in DSLs. I was pretty impressed how easy this stuff is to set up and get working. I used it to create an editor for a small game that uses a textual description of the playing field which is then parsed and transformed into the game's object model.

Robert Petermeier