I'm learning some compiler theory and practice at the moment. Ruby is my every day language of choice, and so I went to look at its lexer and parse grammar. Does ruby have a separate lexer? If so, which file is it described in?
A:
In the ruby source there is the parse.y
file which contains the grammar. I am relatively sure that ruby uses a separate lexer (like most LR parsers). Also it seems like the lexer is stateful:
enum lex_state_e {
EXPR_BEG, /* ignore newline, +/- is a sign. */
EXPR_END, /* newline significant, +/- is an operator. */
EXPR_ENDARG, /* ditto, and unbound braces. */
EXPR_ARG, /* newline significant, +/- is an operator. */
EXPR_CMDARG, /* newline significant, +/- is an operator. */
EXPR_MID, /* newline significant, +/- is an operator. */
EXPR_FNAME, /* ignore newline, no reserved words. */
EXPR_DOT, /* right after `.' or `::', no reserved words. */
EXPR_CLASS, /* immediate after `class', no here document. */
EXPR_VALUE /* alike EXPR_BEG but label is disallowed. */
};
I guess this necessary because a newline is ignored in some cases and in other cases it terminates expressions etc. Also 'class' is not always a keyword like e.g. in 'x.class'.
But i'm no expert.
EDIT: Looking deeper in the parse.y file the lexer is not completely separate from the parser:
superclass : //[...]
| '<'
{
lex_state = EXPR_BEG;
}
Ragmaanir
2010-10-25 18:03:02