views:

43

answers:

1

I'm learning some compiler theory and practice at the moment. Ruby is my every day language of choice, and so I went to look at its lexer and parse grammar. Does ruby have a separate lexer? If so, which file is it described in?

A: 

In the ruby source there is the parse.y file which contains the grammar. I am relatively sure that ruby uses a separate lexer (like most LR parsers). Also it seems like the lexer is stateful:

enum lex_state_e {
EXPR_BEG,           /* ignore newline, +/- is a sign. */
EXPR_END,           /* newline significant, +/- is an operator. */
EXPR_ENDARG,        /* ditto, and unbound braces. */
EXPR_ARG,           /* newline significant, +/- is an operator. */
EXPR_CMDARG,        /* newline significant, +/- is an operator. */
EXPR_MID,           /* newline significant, +/- is an operator. */
EXPR_FNAME,         /* ignore newline, no reserved words. */
EXPR_DOT,           /* right after `.' or `::', no reserved words. */
EXPR_CLASS,         /* immediate after `class', no here document. */
EXPR_VALUE          /* alike EXPR_BEG but label is disallowed. */
};

I guess this necessary because a newline is ignored in some cases and in other cases it terminates expressions etc. Also 'class' is not always a keyword like e.g. in 'x.class'.

But i'm no expert.

EDIT: Looking deeper in the parse.y file the lexer is not completely separate from the parser:

superclass  : //[...]
    | '<'
        {
        lex_state = EXPR_BEG;
        }
Ragmaanir