views:

313

answers:

3

I'm writing a C parser using PLY, and recently ran into a problem. This code:

typedef int my_type;
my_type x;

Is correct C code, because my_type is defined as a type previously to being used as such. I handle it by filling a type symbol table in the parser that gets used by the lexer to differentiate between types and simple identifiers.

However, while the type declaration rule ends with SEMI (the ';' token), PLY shifts the token my_type from the second line before deciding it's done with the first one. Because of this, I have no chance to pass the update in the type symbol table to the lexer and it sees my_type as an identifier and not a type.

Any ideas for a fix ?

The full code is at: http://code.google.com/p/pycparser/source/browse/trunk/src/c_parser.py Not sure how I can create a smaller example out of this.

Edit:

Problem solved. See my solution below.

A: 

I think you need to move the check for whether an ID is a TYPEID from c_lexer.py to c_parser.py.

As you said, since the parser is looking ahead 1 token, you can't make that decision in the lexer.

Instead, alter your parser to check ID's to see if they are TYPEID's in declarations, and, if they aren't, generate an error.

As Pax Diablo said in his excellent answer, the lexer/tokenizer's job isn't to make those kinds of decisions about tokens. That's the parser's job.

Mike G.
+2  A: 

Not sure why you're doing that level of analysis in your lexer.

Lexical analysis should probably be used to separate the input stream into lexical tokens (number, line-change, keyword and so on). It's the parsing phase that should be dojng that level of analysis, including table lookups for typedefs and such.

That's the way I've always separated duties between lexx and yacc, my tools of choice (too old to change :-).

paxdiablo
I agree. Whenever I've had these types of problems it's usually because I'm trying to have the lexer do too much. But I use ANTLR now instead of LEX and YACC (**not** too old to change).
David G
+2  A: 

With some help from Dave Beazley (PLY's creator), my problem was solved.

The idea is to use special sub-rules and do the actions in them. In my case, I split the declaration rule to:

def p_decl_body(self, p):
    """ decl_body : declaration_specifiers init_declarator_list_opt
    """
    # <<Handle the declaration here>>        

def p_declaration(self, p):
    """ declaration : decl_body SEMI 
    """
    p[0] = p[1]

decl_body is always reduced before the token after SEMI is shifted in, so my action gets executed at the correct time.

Eli Bendersky