tags:

views:

252

answers:

4

Forgive me, I'm completely new to parsing and lex/yacc, and I'm probably in way over my head, but nonetheless:

I'm writing a pretty basic calculator with PLY, but it's input might not always be an equation, and I need to determine if it is or not when parsing. The extremes of the input would be something that evaluates perfectly to an equation, which it parses fine and calculates, or something that is nothing like an equation, which fails parsing and is also fine.

The gray area is an input that has equation-like parts, of which the parser will grab and work out. This isn't what I want - I need to be able to tell if parts of the string didn't get picked up and tokenized so I can throw back an error, but I have no idea how to do this.

Does anyone know how I can define, basically, a 'catch anything that's left' token? Or is there a better way I can handle this?

+1  A: 
Don
That was the trick. I added a t_error token that just returned false, and everything happens perfectly. Thanks!
bck
A: 

I typically use a separate 'command reader' to obtain a complete command - probably a line in your case - into a host variable string, and then arrange for the lexical analyzer to analyze the string, including telling me when it didn't reach the end. This is hard to set up, but make some classes of error reporting easier. One of the places I've used this technique routinely has multi-line commands with 3 comment conventions, two sets of quoted strings, and some other nasties to set my teeth on edge (context sensitive tokenization - yuck!).

Otherwise, Don's advice with the Yacc 'error' token is good.

Jonathan Leffler
+1  A: 

Define a token (end of input), and make your lexer output it at the end of the input.

So before, if you had these tokens:

'1' 'PLUS' '1'

You'll now have:

'1' 'PLUS' '1' 'END_OF_INPUT'

Now, you can define your top-level rule in your parser. Instead of (for example):

Equation ::= EXPRESSION

You'll have

Equation ::= EXPRESSION END_OF_INPUT

Obviously you'll have to rewrite these in PLY syntax, but this should get you most of the way.

Paul Hankin
This is also how the standard yacc works - it accepots only when no more tokens are left.
Ingo
A: 

It looks like you've already found a solution but I'll add another suggestion in case you or others are interested in an alternative approach.

You say you are using PLY but is that because you want the compiler to run in a Python environment? If so, you might consider other tools as well. For such jobs I often use ANTLR (http://www.antlr.org) which has a Python code generator. ANTLR has lots of tricks for doing things like eating a bunch of input at the lexer level so the parser never sees it (e.g. comments), ability to call a sub-rule (e.g. equation) within a larger grammar (which should terminate once the rule has been matched without processing any more input...sounds somewhat like what you want to do) and a very nice left-factoring algorithm.

ANTLRs parsing capability combined with the use of the StringTemplate (http://www.stringtemplate.org) engine makes a nice combination and both support Python (among many others).

Michael Tiller