views:

57

answers:

1

Is there any way to return multiple tokens in OCamlLex?

I'm trying to write a lexer and parser for an indentation based language, and I would like my lexer to return multiple DEDENT tokens when it notices that the indentation level is less than it previously was. This will allow it to notify the parser when multiple blocks have ended.

By following this method, I would be able to use INDENT and DEDENT as drop-in replacements for BEGIN and END, as these two tokens would be implied by the INDENT and DEDENT tokens.

+3  A: 

Return the list of tokens. If the parser cannot natively handle that (say ocamlyacc) - just insert a cache in between :

let cache =
  let l = ref [] in
  fun lexbuf ->
    match !l with
    | x::xs -> l := xs; x
    | [] -> match Lexer.tokens lexbuf with
            | [] -> failwith "oops"
            | x::xs -> l := xs; x

Or you can run the lexer on the full document and then run the parser on the full token stream.

BTW did you see ocaml+twt?

ygrek
Thanks, I will give that a try soon, and see if I can make that work for me. It might be a bit annoying, because the DEDENT token is the only one that can appear multiple times, but I can work around that.
Joe Bloggs