tags:

views:

261

answers:

3

First, I understand the how of iteratees, well enough that I could probably write a simplistic and buggy implementation without referring back to any existing ones.

What I'd really like to know is why people seem to find them so fascinating, or under what circumstances their benefits justify their complexity. Comparing them to lazy I/O there is a very clear benefit, but that seems an awful lot like a straw man to me. I never felt comfortable about lazy I/O in the first place, and I avoid it except for the occasional hGetContents or readFile, mostly in very simple programs.

In real-world scenarios I generally use traditional I/O interfaces with control abstractions appropriate to the task. In that context I just don't see the benefit of iteratees, or to what task they are an appropriate control abstraction. Most of the time they seem more like unnecessary complexity or even a counterproductive inversion of control.

I've read a fair number of articles about them and sources that make use of them, but have not yet found a compelling example that actually made me think anything along the lines of "oh, yea, I'd have used them there too." Maybe I just haven't read the right ones. Or perhaps there is a yet-to-be-devised interface, simpler than any I've yet seen, that would make them feel less like a Swiss Army Chainsaw.

Am I just suffering from not-invented-here syndrome or is my unease well-founded? Or is it perhaps something else entirely?

+3  A: 

under what circumstances their benefits justify their complexity

Every language has strict (classical) IO, where all resources are managed by the user. Haskell also provides ubiquitous lazy IO, where all resource management is delegated to the system.

However, that can create problems, as the scope of resources is dependent on runtime demand properties.

Iteratees strike a third way:

  • High level abstractions, like lazy IO.
  • Explicit, lexical scoping of resources, like strict IO.

It is justified when you have complex IO processing tasks, but very tight bounds on resource use. An example is a web server.

Indeed, Snap is built around iteratee IO on top of epoll.

Don Stewart
Part of my problem is that to me they do not seem high-level at all. So far, all the tutorials and explanations I have seen seem to require one to understand how they are implemented in order to use them. There was a thread on Haskell-cafe a while back attempting to establish some kind of implementation-independent claims about their semantics, but I don't recall it having any sort of satisfying conclusion. Perhaps sitting down and making myself use them for something non-trivial would change my impression, but at this point I don't even know what sort of task I would actually use them _for_.
mokus
I'll definitely take a look at the code for snap, and any other good examples anyone would care to cite. After all, what I'm really after is a practical understanding of how and why to use them.
mokus
+4  A: 

As to why people find them so fascinating, I think because they're such a simple idea. The recent discussion on Haskell-cafe about a denotational semantics for iteratees devolved into a consensus that they're so simple they're barely worth describing. The phrase "little more than a glorified left-fold with a pause button" sticks out to me from that thread. People who like Haskell tend to be fond of simple, elegant structures, so the iteratee idea is likely very appealing.

For me, the chief benefits of iteratees are

  1. Composability. Not only can iteratees be composed, but enumerators can too. This is very powerful.
  2. Safe resource usage. Resources (memory and handles mostly) cannot escape their local scope. Compare to strict I/O, where it's easier to create space leaks by not cleaning up.
  3. Efficient. Iteratees can be highly efficient; competitive with or better than both lazy I/O and strict I/O.

I have found that iteratees provide the greatest benefits when working with single logical data that comes from multiple sources. This is when the composability is most helpful, and resource management with strict I/O most annoying (e.g. nested allocas or brackets).

For an example, in a work-in-progress audio editor, a single logical chunk of sound data is a set of offsets into multiple audio files. I can process that single chunk of sound by doing something like this (from memory, but I think this is right):

enumSound :: MonadIO m => Sound -> Enumerator s m a
enumSound snd = foldr (>=>) enumEof . map enumFile $ sndFiles snd

This seems clear, concise, and elegant to me, much more so than the equivalent strict I/O. Iteratees are also powerful enough to incorporate any processing I want to do, including writing output, so I find this very nice. If I used lazy I/O I could get something as elegant, but the extra care to make sure resources are consumed and GC'd would outweigh the advantages IMO.

I also like that you need to explicitly retain data in iteratees, which avoids the notorious mean xs = sum xs / length xs space leak.

Of course, I don't use iteratees for everything. As an alternative I really like the with* idiom, but when you have multiple resources that need to be nested that gets complex very quickly.

John
I also make frequent use of the with* idiom. It's not so much the nesting as the overlapping non-nesting cases that it doesn't handle so well (eg, acquire A, acquire B, release A, release B, with things happening in between). Which makes me wonder, do iteratees handle this scenario any better?
mokus
BTW, Your enumSound example is pretty spot-on in terms of what I'm looking for - an example of (real) code where iteratees are doing something clear and simple that wouldn't be quite so clear and simple without them.
mokus
@mokus: Quite possibly--iteratees are incremental folds with suspend/resume behavior, each step driven by an external data source, so much resource management can be naturally abstracted out of the core logic. Then it just becomes a matter of composing data sources (i.e., enumerators).
camccann
+3  A: 

Essentially, it's about doing IO in a functional style, correctly and efficiently. That's all, really.

Correct and efficient are easy enough using quasi-imperative style with strict IO. Functional style is easy with lazy IO, but it's technically cheating (using unsafeInterleaveIO under the hood) and can have issues with resource management and efficiency.

In very, very general terms, a lot of pure functional code follows a pattern of taking some data, recursively expanding it into smaller pieces, transforming the pieces in some fashion, then recombining it into a final result. The structure may be implicit (in the call graph of the program) or an explicit data structure being traversed.

But this falls apart when IO is involved. Say your initial data is a file handle, the "recursively expand" step is reading a line from it, and you can't read the entire file into memory at once. This forces the entire read-transform-recombine process to be done for each line before reading the next one, so instead of the clean "unfold, map, fold" structure they get mashed together into explicitly recursive monadic functions using strict IO.

Iteratees provide an alternative structure to solve the same problem. The "transform and recombine" steps are extracted and, instead of being functions, are changed into a data structure representing the current state of the computation. The "recursively expand" step is given the responsibility of obtaining the data and feeding it to an (otherwise passive) iteratee.

What benefits does this offer? Among other things:

  • Because an iteratee is a passive object that performs single steps of a computation, they can be easily composed in different ways--for instance, interleaving two iteratees instead of running them sequentially.
  • The interface between iteratees and enumerators is pure, just a stream of values being processed, so a pure function can be freely spliced in between them.
  • Data sources and computations are oblivious to each other's internal workings, decoupling input and resource management from processing and output.

The end result is that a program can have a high-level structure much closer to what a pure functional version would look like, with many of the same benefits to compositionality, while simultaneously having efficiency comparable to the more imperative, strict IO version.

As for being "worth the complexity"? Well, that's the thing--they're really not that complex, just a bit new and unfamiliar. The idea's been floating around for only, what, a couple years? Give it some time for things to shake out as people use iteratee-based IO in larger projects (e.g., with things like Snap), and for more examples/tutorials to appear. It's likely that, in hindsight, the current implementations will seem very rough around the edges.


Somewhat related: You may want to read this discussion about functional-style IO. Iteratees aren't mentioned all that much, but the central issue is very similar. In particular this solution, which is both very elegant and goes even further than iteratees in abstracting incremental IO.

camccann
That is an interesting (and useful) perspective. I agree that the concept behind iteratees is simple, but the existing generic implementations still seem quite complex to me. That disconnect is tough for me to swallow. Very good point also about the newness of the iteratee concept. That, together with the sparse documentation, is one of the main reasons I'm looking for real-world examples rather than simply dismissing the idea.
mokus