ansaurus

Question

How is memory-efficient non-destructive manipulation of collections achieved in functional programming?

Answer 1

+5 A:

While the referenced objects are the same in your code, I beleive the storage space for the references themselves and the structure of the list is duplicated by take. As a result, while the referenced objects are the same, and the tails are shared between the two lists, the "cells" for the initial portions are duplicated.

I'm not an expert in functional programming, but maybe with some kind of tree you could achieve duplication of only log(n) elements, as you would have to recreate only the path from the root to the inserted element.

Jérémie Koenig 2010-01-03 02:57:00

I think you're right on the tree structure, I've seen exactly such an example discussed elsewhere. I wanted to try (linked) lists first because these were the simplest collection type I could think of.Do you mention `take` because that's where a new cons cell is created by `x::(take xs (n-1))`?

stakx 2010-01-03 03:05:01

Indeed -- on the other hand `skip` returns the original tail,so it is not duplicated.

Jérémie Koenig 2010-01-03 03:08:28

Answer 2

+24 A:

Juliet 2010-01-03 03:56:50

Ha, I can always tell a Juliet answer before I get to the bottom. It's all like, "Boom! Charts!"

Chuck 2010-01-03 04:32:29

stakx 2010-01-03 11:30:29

Answer 3

A:

You may be interested in reduction strategies of expressions in functional programming languages. A good book on the subject is The Implementation of Functional Programming Languages, by Simon Peyton Jones, one of the creators of Haskell. Have a look especially at the following chapter Graph Reduction of Lambda Expressions since it describes the sharing of common subexpressions. Hope it helps, but I'm afraid it applies only to lazy languages.

primodemus 2010-01-03 04:08:44

Answer 4

+4 A:

It sounds to me like your question is primarily about immutable data, not functional languages per se. Data is indeed necessarily immutable in purely functional code (cf. referential transparency), but I'm not aware of any non-toy languages that enforce absolute purity everywhere (though Haskell comes closest, if you like that sort of thing).

Roughly speaking, referential transparency means that no practical difference exists between a variable/expression and the value it holds/evaluates to. Because a piece of immutable data will (by definition) never change, it can be trivially identified with its value and should behave indistinguishably from any other data with the same value.

Therefore, by electing to draw no semantic distinction between two pieces of data with the same value, we have no reason to ever deliberately construct a duplicate value. So, in cases of obvious equality (e.g., adding something to a list, passing it as a function argument, &c.), languages where immutability guarantees are possible will generally reuse the existing reference, as you say.

Likewise, immutable data structures possess an intrinsic referential transparency of their structure (though not their contents). Assuming all contained values are also immutable, this means that pieces of the structure can safely be reused in new structures as well. For example, the tail of a cons list can often be reused; in your code, I would expect that:

(skip 1 L) === (skip 2 M)

...would also be true.

Reuse isn't always possible, though; the initial portion of a list removed by your skip function can't really be reused, for instance. For the same reason, appending something to the end of a cons list is an expensive operation, as it must reconstruct a whole new list, similar to the problem with concatenating null-terminated strings.

In such cases, naive approaches quickly get into the realm of awful performance you were concerned about. Often, it's necessary to substantially rethink fundamental algorithms and data structures to adapt them successfully to immutable data. Techniques include breaking structures into layered or hierarchical pieces to isolate changes, inverting parts of the structure to expose cheap updates to frequently-modified parts, or even storing the original structure alongside a collection of updates and combining the updates with the original on the fly only when the data is accessed.

Since you're using F# here I'm going to assume you're at least somewhat familiar with C#; the inestimable Eric Lippert has a slew of posts on immutable data structures in C# that will probably enlighten you well beyond what I could provide. Over the course of several posts he demonstrates (reasonably efficient) immutable implementations of a stack, binary tree, and double-ended queue, among others. Delightful reading for any .NET programmer!

camccann 2010-01-03 04:20:17

Thank you for the pointer to Eric Lippert's blog. It's indeed very helpful, since I program mostly in C# (but hope to improve my skills through a better understanding of functional programming).

stakx 2010-01-03 11:04:30

Oh, and thanks not just for the reference, but for your post as well, of course! I found the remark about breaking structures into hierarchical / layered data to be very useful, too.

stakx 2010-01-03 11:33:21

Answer 5

+10 A:

Norman Ramsey 2010-01-03 05:24:54

ansaurus

tags:

views:

answers:

How is memory-efficient non-destructive manipulation of collections achieved in functional programming?

This is how far I've got until now:

My question:

related questions