views:

881

answers:

8

In Real World Haskell, there is a section titled "Life without arrays or hash tables" where the authors suggest that list and trees are preferred in functional programming, whereas an array or a hash table might be used instead in an imperative program.

This makes sense, since it's much easier to reuse part of an (immutable) list or tree when creating a new one than to do so with an array.

So my questions are:

  • Are there really significantly different usage patterns for data structures between functional and imperative programming?
  • If so, is this a problem?
  • What if you really do need a hash table for some application? Do you simply swallow the extra expense incurred for modifications?
+1  A: 

Functional programs tend to put more emphasis on recursion. This, in turn, suggests the use of recursive algorithms and recursive data structures. Both lists and trees are recursive structures (the "next" link on a list is another list, and the children of a tree node are both trees).

You may want to reconsider if you're looking at extra expense on an algorithm. Why does the hash table (which is O(1) for a non-recursive algorithm) incur an extra expense? What advantage are you gaining by using it, as opposed to a tree or list?

lacqui
A: 

Yes, the primary difference is immutability of the data, which can include code (see higher order functions). See the Wikipedia page on Purely Functional for a list of the common data types and usages. Whether its a problem or not depends on how you look at it. There are many advantages to programming in a functional language if it fits the type of task you are working on. A hash table is a type of associative array, but isn't the one you want to use in a functional language because of the rehashing you'd have to do on insert and the poor performance without arrays. Instead, try the Haskell implementation of Data.Map for an associative array.

John Ellinwood
+8  A: 
  • Yes. Typically tuples, lists, and partially-evaluated functions are very common data structures in functional programming languages. Mutable data structures, like arrays and (real) hash tables, are used much less because they don't fit in as well with Haskell. SML (which is also functional, but not lazy) can use arrays more naturally than Haskell, but lists are still more common because they map well to recursive algorithms.
  • I'm not sure how to answer this. A problem for who?
  • There exist implementations of associative arrays ("hash table" equivalent) which can continue to share most of their underlying structure even after different updates. I believe GHC's Data.Map does; also, Edison has quite a few lazy/functional-friendly data structures.
ephemient
By "Is this a problem," I meant "Is this a problem for functional programmers?" How does SML manage to use Arrays more naturally? I thought that the immutability was the biggest problem for arrays, rather than the laziness, so how does SML make it work?
Rob Lachlan
SML has mutable arrays.
Chris Conway
Mutability is a lot friendlier when your language is strict (SML) instead of lazy (Haskell).
ephemient
+10  A: 

The book Purely Functional Data Structures covers your questions in depth, and includes a great mix of theory and implementations primarily in ML - the appendix also contains Haskell implementations so you should be able to follow along with a bit of extra page turning. It is a pretty good (though difficult in parts) read if you are really interested in a thorough answer to your questions. Having said that I think ephemient gave a superb short answer.

edit: Steven Huwig provided a link to the thesis that the book started as. While I haven't read through it the only big thing missing (judging from the table of contents) are the Haskell implementations.

Andrew Khosravian
+3  A: 

Chris Okasaki's thesis, Purely Functional Data Structures, is available for free online. It covers many different strategies for immutable persistent data representation.

As far as really needing a hash table, consider that an O(lg n) lookup is only twenty times as slow as an O(1) lookup when you are searching a million elements.

Steven Huwig
Nice link, thank you.
Rob Lachlan
On the hash table: not necessarily. Yes, a tree is O(n log n), but thats not the same thing as 20 times slower just because (log n) is 20.
Paul Johnson
Balanced tree lookup is O(lg n), not O(n lg n). But yes, there are a lot of other factors that can come into play. Typically O-notation is with reference to some class of operation, in this case comparison operations.
Steven Huwig
+1  A: 
  • Are there really significantly different usage patterns for data structures between functional and imperative programming?

Big, gigantic, night and day — largely because side-effects are not tolerated in functional programming.

  • If so, is this a problem?

The problem is for the imperative paradigms that will be unable to retain efficiency as parallelization becomes more necessary — the only way out for these languages will be to get rid of side-effects but then they will become broken functional languages - but then, why should I bother with them when there are some pretty good, working functional languages. Also, the semantics of functional languages is easier to control hence, functional programs can be proved correct whereas their C++ counterparts cannot (yet, anyway). Hence, many formal verification tools are based on functional languages — for example, ACL2 is based on common lisp and Cryptol is based on Haskell. Since Formal Verification is the wave of the future functional languages can better integrate with those tools. In short, say bye-bye to C,C++ and such — good ridence! Someone should have taken a 30 ought 6 to them a long time ago.

  • What if you really do need a hash table for some application? Do you simply swallow the extra expense incurred for modifications?

The wave of the future is this: you write a functional programming with specifying a hash table - the language you use is cryptol. When you are done and have proven that your program works you press a button and out pops an efficient version that uses a hash table if it has been decided that is the best thing to use.

Nerdling
-1: "The problem is for the imperative paradigms that will be unable to retain efficiency as parallelization becomes more necessary — the only way out for these languages will be to get rid of side-effects but then they will become broken functional languages". There is overwhelming evidence to the contrary. Purely functional solutions are slow on one core and scale badly to multiple cores because they saturate memory bandwidth and stress the GC. When the experts optimize their purely functional code for parallelism they do so by introducing mutation.
Jon Harrop
+3  A: 

Yes, the usage patterns are dramatically different, but no it's not a problem. If you want a hash table, you usually mean you want a finite map with string keys and fast access. Bentley and Sedgewick's ternary search trees are pureful functional, and at least in some cases, they outperform hash tables.

As mentioned above, Chris Okasaki's book on purely functional data structures is very good.

Norman Ramsey
+6  A: 

Speaking as someone who's been doing OO for many years, and recently has been building a largeish project requiring a good deal of speed (a real-time automated options trading system) in Haskell:

  • Are there really significantly different usage patterns for data structures between functional and imperative programming?

If you're talking about Haskell, yes, very much so. A large part of this is due to purity, however; the differences are somewhat less in other functional languages where mutable data is more often used. That said, as others have pointed out, recursive code and structures are much more heavily used in all or nearly all functional languages.

  • If so, is this a problem?

It hasn't been one for me, aside from having to spend some time to learn the new way of working. In particular, performance has certainly not been an issue: the system I'm working on runs considerably faster than the previous Java implementation, for example.

  • What if you really do need a hash table for some application? Do you simply swallow the extra expense incurred for modifications?

Generally the problem is not that "you really do need a hash table," but that you need access to certain data within some given time constraints (which may well be, "as fast as possible on some given hardware."). For that, you look around and do what you need to do. If that includes introducing mutability, I don't see a big problem with that, and you can do it in Haskell, though it might not be as convenient as it is in other languages. But do keep in mind, if you have a problem of this nature, it's certainly not going to be as simple as, "use a generic hash table and you're done." Extremely high performance for particular functionality on a particular hardware platform invariably takes a lot of work, and usually more than a few tricks. Preferring one language implementation over another just because it has some particular thing that works better than it does in other languages is, in my opinion, a rather unsophisticated approach to software engineering that is not likely to consistently produce good results.

Curt Sampson