tags:

views:

160

answers:

3

Hi,

I'm trying to implement a python parser using PLY for the Kconfig language used to generate the configuration options for the linux kernel.

There's a keyword called source which performs an inclusion, so what i do is that when the lexer encounters this keyword, I change the lexer state to create a new lexer which is going to lex the sourced file:

def t_begin_source(t):
    r'source '
    t.lexer.begin('source')

def t_source_path(t):
    r'[^\n]+\n+'
    t.lexer.begin('INITIAL') 
    global path
    source_lexer = lex.lex(errorlog=lex.NullLogger())
    source_file_name = (path +  t.value.strip(' \"\n')) 
    sourced_file = file(path + t.value.strip(' \"\n')).read()

    source_lexer.input(sourced_file)

    while True:
        tok = source_lexer.token()
        if not tok:
            break

Somewhere else I have this line

lexer = lex.lex(errorlog=lex.NullLogger())

This is the "main" or "root" lexer which is going to be called by the parser.

My problem is that I don't know how to tell the parser to use a different lexer or to tell the "source_lexer" to return something...

Maybe the clone function should be used...

Thanks

+2  A: 

I don't know about the details of PLY, but in other systems like this that I've built, it made the most sense to have a single lexer which managed the stack of include files. So the lexer would return a unified stream of tokens, opening and closing include files as they were encountered.

Ned Batchelder
You can use the lexer's `input` method to reset the lexer and provide new input. After the included file is done you have to go back to the original file where you left off (more or less.)
S.Lott
yep, but i don't know how to return these tokens to the parser.
LB
A: 

Ok,

so what i've done is building a list of all the tokens, which is built before the actual parsing.

The parser no longer calls the lexer because you can override the getToken function used by the parser using the tokenfunc parameter when calling the parse function.

result = yacc.parse(kconfig,debug=1,tokenfunc=my_function)

and my function which is now the function called to get the next token iterates over the list of tokens previously built.

Considering the lexing, when I encounter a source keyword, I clone my lexer and change the input to include the file.

def sourcing_file(source_file_name):
    print "SOURCE FILE NAME " , source_file_name
    sourced_file = file(source_file_name).read()
    source_lexer = lexer.clone()
    source_lexer.input(sourced_file)
    print 'END OF SOURCING FILE'

    while True:
        tok = source_lexer.token()
        if not tok:
            break
        token_list.append(tok)
LB
+2  A: 

By an interesting coincidence a link from the same Google search that led me to this question explains how to write your own lexer for a PLY parser. The post explains it simply and well, but it's a matter of four instance variables and single token method.

quark
i've done that too... but finally, the solution of building a complete list of token and use my own getToken function was not so bad...
LB