views:

150

answers:

2

Hello, Clojure truly piqued my interest, and I started going through a tutorial on it: http://java.ociweb.com/mark/clojure/article.html

Consider these two lines mentioned under "Set":

(def stooges (hash-set "Moe" "Larry" "Curly")) ; not sorted
(def more-stooges (conj stooges "Shemp")) ; -> #{"Moe" "Larry" "Curly" "Shemp"}

My first thought was that the second operation should take constant time to complete; otherwise functional language might have little benefit over an object-oriented one. One can easily imagine a need to start with [nearly] empty set, and populate it and shrink it as we go along. So, instead of assigning the new result to more-stooges, we could re-assign it to itself.

Now, by the marvelous promise of functional languages, side effects are not to be concerned with. So, sets stooges and more-stooges should not work on top of each other ever. So, either the creation of more-stooges is a linear operation, or they share a common buffer (like Java's StringBuffer) which would seem like a very bad idea and conflict with immutability (subsequently stooges can drop an element one-by-one).

I am probably reinventing a wheel here. it seems like the hash-set would be more performant in clojure when you start with the maximum number of elements and then remove them one at a time until empty set as oppose to starting with an empty set and growing it one at a time.

The examples above might not seem terribly practical, or have workarounds, but the object-oriented language like Java/C#/Python/etc. has no problem with either growing or shrinking a set one or few elements at a time while also doing it fast.

A [functional] language which guarantees(or just promises?) immutability would not be able to grow a set as fast. Is there another idiom that one can use which somehow can help avoiding doing that?

For someone familiar with Python, I would mention set comprehension versus an equivalent loop approach. The running time of the two is tiny bit different, but that has to do with relative speeds of C, Python, interpreter and not rooted in complexity. The problem I see is that set comprehension is often a better approach, but NOT ALWAYS the best approach, for the readability might suffer a great deal.

Let me know if the question is not clear.

+3  A: 

Clojure's data structures are persistent, which means that they are immutable but use structural sharing to support efficient "modifications". See the section on immutable data structures in the Clojure docs for a more thorough explanation. In particular, it states

Specifically, this means that the new version can't be created using a full copy, since that would require linear time. Inevitably, persistent collections are implemented using linked data structures, so that the new versions can share structure with the prior version.

These posts, as well as some of Rich Hickey's talks, give a good overview of the implementation of persistent data structures.

Aaron Novstrup
How complicated are deletes? Suppose I have 10 different sets or dictionaries, where they are created incrementally. And then I delete a node that is common to all 10 from the very first set/dict (the one containing just one element). What will clojure do and how long will it take?
Hamish Grubijan
It doesn't matter whether the set/dictionary is built incrementally or all at once. A delete is just as efficient as an add -- it's effectively a constant time operation.
Aaron Novstrup
+8  A: 

The core Immutable data structures are one of the most fascinating parts of the language for me also. their is a lot to answering this question and Rich does a really great job of it in this video:

http://blip.tv/file/707974

The core data structures:

  • are actually fully immutable
  • the old copies are also immutable
  • performance does not degrade for the old copies
  • access is constant (actually bounded <= a constant)
  • all support efficient appending, concatenating (except lists and seqs) and chopping

How do they do this???

  • the secret: it's pretty much all trees under the hood (actually a trie).

But what if i really want to edit somthing in place?

  • you can use clojure's transients to edit a structure in place and then produce a immutable version (in constant time) when you are ready to share it.

as a little background: a Trie is a tree where all the common elements of the key are hoisted up to the top of the tree. the sets and maps in clojure use trie where the indexes are a hash of the key you are looking for. it then breaks the hash up into small chunks and uses each chunk as the key to one level of the hash-trie. This allows the common parts of both the new and old maps to be shared and the access time is bounded because there can only be a fixed number of branches because the hash used as in input has a fixed size.

Using these hash tries also helps prevent big slowdowns during the re-balancing used by a lot of other persistent data structures. so you will actually get fairly constant wall-clock-access-time.

I really reccomend the (relativly short)_ book: Purely Functional Data Structures In it he covers a lot of really interesting structures and concepts like "removing amortization" to allow true constant time access for queues. and things like lazy-persistent queues. the author even offers a free copy in pdf here

Arthur Ulfeldt
Is a "trei" different to a trie or is it just a typo?
poolie
its trei (pronounced try) I just had a bit of keyboard trouble :)
Arthur Ulfeldt
From the wikipedia article: "The term trie comes from 'retrieval.' Following the etymology, the inventor, Edward Fredkin, pronounces it 'tree'. However, it is pronounced 'try' by other authors."
Aaron Novstrup