ansaurus

Question

Explaining persistent data structures in simple terms

Answer 1

+3 A:

Try this:

Persistent data structures are data structures which, unlike arrays or lists, always preserve their original structure and contents.

They are effectively immutable - instead of changing such a structure, any computation will yield a new copy leaving the original structure unchanged. Whereas conventional data structures follow an imperative (change-based) philosophy, persistent ones are commonly used in functional programming where, in extreme cases, no change is allowed at all. The advantage of this paradigm is that persistent data are much safer, often easier to handle and even preventing bugs connected to unintended change and multithreading issues. Many problems can be formulated functionally in a much more concise way.

Example for a non-persistent data structure: The list (vector-list)

list = [1, 2, 3]
# List is [1, 2, 3]

list += [4]
# Added an element at the end

list[0] = 42
# We changed the first element to 42
# List is [42, 2, 3, 4]

Persistence example: Linked lists (Haskell)

list = [1, 2, 3]
# List is [1, 2, 3]

list2 = 0 : list
# List2 combines [0] and [1, 2, 3] to [0, 1, 2, 3]
# List is still [1, 2, 3]

list3 = updateAt 0 42 list
# List3 contains the elements of list2 with the first one set to 42
# List and List2 still hold their original values

Dario 2009-09-16 13:19:13

Kind of a combination of "immutable" and "copy on write". For instance, these lists are very tuplish - note the behavior of a1 =(1,2,3); a2 = a1; a1 += (4,). Except tuples do not permit indexed assignment.

Paul McGuire 2009-09-16 13:22:46

Answer 2

+4 A:

Explaining "persistent" data structures is much, much easier if you use the word the Python community uses: "immutable".

Everyone understands immutable numbers. 2+2 can't update 2, it has to create a new value.

Everyone understand mutable lists. [2, 3].append(4) actually updates the [2,3] list. The list is mutable, the individual numbers are not. See above.

The cool step is explaining immutable tuples. (2,3).append(4) can't exist because a tuple is an immutable ("persistent") data structure. In this case, neither the tuple nor the numbers are mutable.

Immutable strings, similarly, makes perfect sense. "hello" + "there" cannot update the string "hello", but can create a new string.

Some people balk at this. Refer back to tuples and integers. It's simpler to have strings behave like numbers than to behave like lists.

Immutable fractions and decimal numbers are good examples of immutable ("persistent") data structures.

S.Lott 2009-09-16 13:43:24

Perhaps I could use the word immutable more, but I think the word "persistent" is necessary. Otherwise, how would you differentiate a PersistentList from a tuple?

Jason Baker 2009-09-16 14:30:28

I don't think you can make the case that they're a deep, important difference between a PersistentList and a Tuple. A PersistentList may reuse a sublist as an optimization, but that's just a small "prevent a copy" technique. It's not profound enough to dwell on.

S.Lott 2009-09-16 16:00:16

I agree that the need is very much a niche need, but it's still an important difference to understand. Maybe the difference isn't earth-shattering, but there needs to be *some* distinction. I mean if I use the term "Immutable list", my first reaction would be to think of a tuple.

Jason Baker 2009-09-17 16:25:58

@Jason Baker: And you'd be right. An immutable list *is* a tuple. The "reuse" subtlety of a "persistent" data structure is a subtlety that's almost incomprehensible in a modern era of multi-gigabyte memory and nano-second clocks.

S.Lott 2009-09-17 17:00:03

Answer 3

+1 A:

Other useful concept words to describe persistent data structures are "Snapshot" or "Version", which you may precede by "virtual" because an efficient implementation of a persistent structure would not merely make multiple copies. Persistent Data Structure libraries strive to store and process the structures in efficient fashion.

Partial persistence allows modifications to the current structure but allows previous versions of that structure to be queried. Full persistence allow any version of the structure to be modified and queried. In both cases, any modification does not impact the integrity of previous versions which can still be shown/queried as they were prior to the changes.

I think that for the presentation page of your project, you can use much of the wording from Dario and S.Lott, trying to address the following:

Brief description of what Persistent Data Structures are
possibly a link or two to basic introduction articles of the concept
Quick exit point about this project not being about data persistence (to disk and such)
Where this is used
What it can do for you
Brief code snippets and their output

Also, do remove [from the home page] details about the data type / isinstance and all that. This is valuable info, but only of use for in a user manual or some other in-depth resource.

mjv 2009-09-16 14:08:24

Thanks for the comments about isinstance. I was actually debating whether to include it or not.

Jason Baker 2009-09-16 14:26:20

Answer 4

+1 A:

The sort of phrasing I would suggest is that the value of the given structure is persistent across function calls, and that any function calls that result in a change to the structure result in a new structure being created based on copy-on-change semantics.

For example, given a cons type structure L defined as cons(1,cons(2, cons(3, None))), prepending the value 0 to this list results in a new list L' defined as cons(0, L), which expands to cons(0, cons(1,cons(2, cons(3, None)))). However in order to append the value 4 to this list, this requires a change to a value, which results in a copy being created which contains the change: cons(0, cons(1,cons(2, cons(3, cons(4, None)))))

Obviously you can have more efficient implementations that this, but this is one logical representation of what can occur.

Personally prefer the term immutable list, which you can say differs from an ordinary python (ordered) tuple in that it defines appending, prepending, and concatenation, since that's probably clearer to the average developer :-)

Michael Sparks 2009-09-19 16:30:49

I thought this sounded like a smart answer. Then I saw who wrote it and it all made sense. :-)

Jason Baker 2009-10-03 15:30:07

Answer 5

+1 A:

I think the key point to make is that persistent data structures are an optimization for handling immutable values.

But they are an important optimization, even with fast cpu and big memory.

For example, Java strings are immutable but they do NOT have a persistent data structure behind the scenes. That means you have to use things like StringBuilder. If they had a persistent data structure then you could just concatenate and not worry about it.

Immutability is becoming important because it helps deal with concurrency. But to make immutability efficient requires persistent data structures.

Clojure has some good explanations of persistent data structures.

Andrew McKinlay 2009-10-02 23:01:02

Answer 6

A:

When you alter a mutable data structure (e.g. insert into a hash table) it changes in-place and the previous version is lost. When you alter a persistent data structure (e.g. insert into a Map) you get a new data structure and the previous version remains valid.

Jon Harrop 2010-05-21 16:08:04

ansaurus

tags:

views:

answers:

Explaining persistent data structures in simple terms

related questions