views:

141

answers:

2

I'm trying to get a deeper understanding of laziness in Haskell.

I was imagining the following snippet today:

data Image = Image { name :: String, pixels :: String }

image :: String -> IO Image
image path = Image path <$> readFile path

The appeal here is that I could simply create an Image instance and pass it around; if I need the image data it would be read lazily - if not, the time and memory cost of reading the file would be avoided:

 main = do
   image <- image "file"
   putStrLn $ length $ pixels image

But is that how it actually works? How is laziness compatible with IO? Will readFile be called regardless of whether I access pixels image or will the runtime leave that thunk unevaluated if I never refer to it?

If the image is indeed read lazily, then isn't it possible I/O actions could occur out of order? For example, what if immediately after calling image I delete the file? Now the putStrLn call will find nothing when it tries to read.

+10  A: 

How is laziness compatible with I/O?

Short answer: It isn't.


Long answer: IO actions are strictly sequenced, for pretty much the reasons you're thinking of. Any pure computations done with the results can be lazy, of course; for instance if you read in a file, do some processing, and then print out some of the results, it's likely that any processing not needed by the output won't be evaluated. However, the entire file will be read, even parts you never use. If you want lazy I/O, you have roughly two options:

  • Roll your own explicit lazy-loading routines and such, like you would in any strict language. Seems annoying, granted, but on the other hand Haskell makes a fine strict, imperative language. If you want to try something new and interesting, try looking at Iteratees.

  • Cheat like a cheating cheater. Functions such as hGetContents will do lazy, on-demand I/O for you, no questions asked. What's the catch? It (technically) breaks referential transparency. Pure code can indirectly cause side effects, and funny things can happen involving ordering of side effects if your code is really convoluted. hGetContents and friends are implemented using unsafeInterleaveIO, which is... exactly what it says on the tin. It's nowhere near as likely to blow up in your face as using unsafePerformIO, but consider yourself warned.

camccann
Thanks for this answer! In fact, it was RWH's description of hGetContents that confused me about this issue. I didn't realize that it was a special case and used unsafe IO calls underneath. So basically, my example reads the file as soon as the readFile action is processed? That seems a lot more consistent if so.
Bill
@Bill: Here's the implementation for readFile, straight from GHC's standard libraries: `readFile name = openFile name ReadMode >>= hGetContents` So no, your example falls under the "cheating cheater" category. That said, the lazy I/O functions are usually safe enough for most day-to-day practical use, so don't sweat it too much unless purity is very important to you.
camccann
I know Oleg says `unsafeInterleaveIO` breaks referential transparency, but I disagree. I would say it's merely nondeterministic, like many things in the `IO` monad. Does `getCurrentTime` break referential transparency because I can use it to determine which of two extrinsically equal functions is implemented more efficiently?
Reid Barton
@Reid Barton: The issue with `unsafeInterleaveIO` is that it lets nondeterminism potentially leak out of `IO` in limited ways. To the best of my knowledge, no one has demonstrated that this particular matter can cause meaningful problems in even somewhat reasonable code, but that's merely "proof by lack of counterexamples". Oleg's contrived example of a pure function whose result depends on the order in which its arguments are evaluated was persuasive enough for me on principle. In practicality, it's not that significant.
camccann
+5  A: 

Lazy I/O breaks Haskell's purity. The results from readFile are indeed produced lazily, on demand. The order in which I/O actions occur is not fixed, so yes, they could occur "out of order". The problem of deleting the file before pulling the pixels is real. In short, lazy I/O is a great convenience, but it's a tool with very sharp edges.

The book on Real World Haskell has a lengthy treatment of lazy I/O and goes over some of the pitfalls.

Norman Ramsey