views:

2484

answers:

7

God I hate the term "code smell", but I can't think of anything more accurate.

I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).

During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.

I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).

I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

+3  A: 

Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.

The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.

John Leidegren
I thought overuse of monads was a code smell. There are plenty of online Haskell tutorials, but very little about good Haskell programming. Much thanks for the paper.
Cybis
Only the IO monad is in any way impure.
Peter Burns
Note that John Huges, the author of "Why Functional Programming Matters," is also the author of "Generalizing Monads to Arrows," which is also a way of handling state.
Curt Sampson
Terrible answer. When handling state, the alternative to using monads is threading the state through every single function. Monads are a very handy mechanism, and the state monad is in no way unpure!
freedrull
Monads are overrated ;)
John Leidegren
A: 

let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.

Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.

So, yes, it's probably a code smell.

Charlie Martin
Why does that myth gets repeated?Most of the traditional monads are neither "unsafe", nor "un-functional". Even IO, which implementation /implementation/ is actually un-functional, as it is handled specifically by compiler, could be considered such.
squadette
Well, primarily because, by the definition, it's true. Even in the most sophisticated compiler, the I/O component must, by necessity, have side-effects: the state of the underlying device must change, or else no I/O has occurred.
Charlie Martin
@Charlie Martin: The IO monad (and IIRC the ST monad) is internally un-functional for the purpose of performance. However, it doesn't have to be. The runtime environment (C code) can simply execute the monad without the Haskell code ever doing anything unsafe or un-functional. All the other monads are not at all unsafe or un-functional.
Zifre
+3  A: 

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?

With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.

Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.

Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.

At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.


Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.

In that case, you could manually write the sort of code that attributes grammars generate for you, for example:

data AbstractSyntax = Literal Int | Block AbstractSyntax
                    | Comment String AbstractSyntax

compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _      = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
                             in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
                             in (code, s : comments')

generateCode :: Int -> Code
labelCode :: Label -> Code -> Code
Tom Lokhorst
Certainly something to keep in mind, thanks. My language really isn't all that complex though - it's a lot like lisp but without macros, lists, or higher-order functions (these would be hard to translate to Whitespace). Using attribute grammars might be a little overkill.
Cybis
In that case you could definitely use the manual AG pattern. It would eliminate the need for the State monad.
Tom Lokhorst
+13  A: 

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.

Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:

empyGraph    :: Graph
mkLabel      :: Label -> Graph
mkAssignment :: Assignment -> Graph  -- modify a register or memory
mkTransfer   :: ControlTransfer -> Graph   -- any control transfer
(<*>)        :: Graph -> Graph -> Graph

But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:

withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
             -> Graph   -- code in the 'then' branch
             -> Graph   -- code in the 'else' branch 
             -> Graph   -- resulting if-then-else construct

The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.

The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:

type Graph = RealGraph -> [Label] -> (RealGraph, [Label])

or a bit more accurately

type Graph = RealGraph -> State [Label] RealGraph
  -- a Graph is a monadic function from a successor RealGraph to a new RealGraph

With the state monad hidden behind a layer of abstraction, it's not smelly at all!

Norman Ramsey
+3  A: 

It's possible that you may want an applicative functor instead of a monad:

http://www.haskell.org/haskellwiki/Applicative_functor

I think the original paper explains it better than the wiki, however:

http://www.soi.city.ac.uk/~ross/papers/Applicative.html

jrockway
+1  A: 

In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.

nominolo
+7  A: 

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.

This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.

An fairly clear example of this was when my team was working on our entry for the [ICFP Programming Contest 2009][http://icfpcontest.org/]. We ended up with several different modular parts to this:

  • VM: the virtual machine that ran the simulation program
  • Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
  • Solution: generation of the solution file based on the output of the controllers
  • Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed

Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.

The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.

All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).

When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.

One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

Curt Sampson
Very nice answer, though I'm not sure it's proper to change the "accepted" answer two months later. Much thanks regardless.
Cybis
I've added a new paragraph to show why I think the accepted answer is, on further reflection, actually the wrong answer. I think you should change it; I'm always annoyed by wrong answers sitting at the top of the list.
Curt Sampson
I accepted based on the paper he linked to, not the answer itself (I agree the answer alone was quite poor). Now that I've thought about it again, that probably wasn't a good enough reason. Proper use of monads is about both modularity (your answer) and abstraction (Norman Ramsey's answer). If I accept this, both will be at the top. That would be best.
Cybis