views:

291

answers:

2

Hi, I'm writing a lexer for a small language in Alex with Haskell.

The language is specified to have pythonesque significant indentation, with an INDENT token or a DEDENT token emitted whenever the indentation level changes.

In a traditional imperative language like C, you'd keep a global in the lexer and update it with the indentation level at each line.

This doesn't work in Alex/Haskell because I can't store any global data anywhere with Haskell, and I can't put all my lexing rules inside any monad or anything.

So, how can I do this? Is it even possible? Or will i have to write my own lexer and avoid using alex?

+3  A: 

I can't store any global data anywhere with Haskell

This is not true; in most cases something like the State monad is sufficient, but there is also the ST monad.

You don't need global state for this task, however. Writing a parser consists of two parts; lexical analysis and syntax analysis. The lexical analysis just turns a stream of characters into a stream of meaningful tokens. The syntax analysis turns tokens into an AST; this is where you should deal with indentation.

As you are interpreting the indentation, you will call a handler function as the indentation level changes -- when it increases (nesting), you call your handler function (perhaps with one arg incremented, if you want to track the indentation level); when the level decreases, you simply return the relevant AST portion from the function.

(As an aside, using a global variable for this is something that would not occur to me in an imperative language either -- if anything, it's an instance variable. The State monad is very similar conceptually to this.)

Finally, I think the phrase "I can't put all my lexing rules inside any monad" indicates some sort of misunderstanding of monads. If I needed to parse and keep global state, my code would look like:

data AST = ...
type Step = State Int AST

parseFunction :: Stream -> Step
parseFunction s = do
   level <- get
   ...
   if anotherFunction then put (level + 1) >> parseFunction ...
   else parseWhatever
   ...
   return node

parse :: Stream -> Step
parse s = do
   if looksLikeFunction then parseFunction ...

main = runState parse 0 -- initial nesting of 0

Instead of combining function applications with (.) or ($), you combine them with (>>=) or (>>). Other than that, the algorithm is the same. (There is no "monad" to be "inside".)

Finally, you might like applicative functors:

eval :: Environment -> Node -> Evaluated
eval e (Constant x) = Evaluated x
eval e (Variable x) = Evaluated (lookup e x)
eval e (Function f x y) = (f <$> (`eval` x) <*> (`eval` y)) e

(or

eval e (Function f x y) = ((`eval` f) <*> (`eval` x) <*> (`eval` y)) e

if you have something like "funcall"... but I digress.)

There is plenty of literature on parsing with applicative functors, monads, and arrows; all of which have the potential to solve your problem. Read up on those and see what you get.

jrockway
i'm quite familiar with programming language design, just not too familiar with haskell and alex. Thanks for the information, but I already knew about just about all of that.By "not putting global state" I meant specifically NOT putting it into a state monad but rather having some mutable state. And by "not putting my lexical rules into a monad" I meant I was unable to thread a state monad through Alex's rules, which, turns out, was actually achievable. Most of what you said came across to me as kinda unhelpful, but I was able to solve the problem anyway with help from #haskell.Thanks anyway
kamatsu
Also layout is almost always done within the lexer, both Python and haskell do that.
kamatsu
What about the state monad is not "mutable state"? Although technically immutable, you certainly don't notice that until you read the source code for (>>=).
jrockway
+4  A: 

Note that in other whitespace-sensitive languages -- like Haskell -- the layout handling is indeed done in the lexer. GHC in fact implements layout handling in Alex. Here's the source:

http://darcs.haskell.org/ghc/compiler/parser/Lexer.x

There are some serious errors in your question that lead you astray, as jrockway points out. "I can't store any global data anywhere with Haskell" is on the wrong track. Firstly, you can have global state, secondly, you should not be using global state here, when Alex fully supports state transitions in rules in a safe manner.

Look at the AlexState structure that Alex provides, letting you thread state through your lexer. Then, look at how the state is used in GHC's layout implementation to implement indent/unindent of the layout rules. (Search for "-- Layout processing" in GHC's lexer to see how the state is pushed and popped).

Don Stewart
Nice. I ended up playing around with Alex a bit and found that it is cleaner in some ways than PArrows (which is normally what I reach for). Thanks for the info :)
jrockway
Ah, thanks for this. I asked on #haskell too, discovered the UserState wrapper for alex. Not much docs on it though, had to do some source hunting.I know about the State monad, but I was unsure about threading state through alex's lexer.Thanks for the help.
kamatsu