views:

116

answers:

2

I'm building an entire application out of immutable objects so that multi-threading and undo become easier to implement. I'm using the Google Collections Library which provides immutable versions of Map, List, and Set.

My application model looks like a tree:

  • Scene is a top-level object that contains a reference to a root Node.
  • Each Node can contain child Nodes and Ports.

An object graph might look like this:

Scene
 |
 +-- Node
      |
      +-- Node 
           |
           +- Port
      +-- Node
           |
           +- Port
           +- Port

If all of these objects are immutable, controlled by a top-level SceneController object:

  • What is the best way to construct this hierarchy?
  • How would I replace an object that is arbitrarily deep in the object tree?
  • Is there a way to support back-links, e.g. a Node having a "parent" attribute?

And more generally:

  • Have any patterns emerged for dealing with this type of data?
  • Is there (academic) literature available on the subject?
  • Is this a good idea?
+2  A: 

If your tree is immutable, then if you want to change it in anyway you have to produce a new tree.

This sounds bad, but its not if all your nodes are also immutable! Since you don't need to make copies of immutable objects, your new tree will mostly refer to the old tree except for the changes you made.

You'll have to design your tree in such a way that each immutable tree refers to other immutable trees. This way you won't need to reproduce the entire immutable tree either.

But if you go the immutable tree route, then you can't have back links. Otherwise you can't reuse sub trees.

Pyrolistical
I figured out that the model is very much like the Git version control system, where changing a file causes the file, and thus the tree and all the above trees, to change.For back links, would there not be an "alias" approach or "path specifier" that can be resolved for a certain version of a tree?
Frederik
What do you mean by back links? Because if a node links to the parent and the parent changes you'll have to regenerate all child and grandchild nodes, etc. That's a lot of work for one change.
Pyrolistical
Well, a getParent() method would be a backlink to the Node's parent. If a Node would have a parent attribute, I would be unable to reuse the original Node. I was wondering if there was a smarter way to do this, equivalent to Unix's "symbolic links" for example.
Frederik
I am not sure if you can do that without higher level structures over references or building your own pseudo reference model which might not perform well
Pyrolistical
+3  A: 

There are two concepts of interest here. First, persistent data structures. If all elements of the tree are immutable, then one can derive a new tree from the original tree by replacing some parts, but referring to the older parts, thus saving time and memory.

For example, if you were to add a third Port to the Node that has two ports already, you'd have to create a new Scene, a new Scene's Node's descendant, and the Node that you are changing. The other Node and all of the Ports do not need to be created anew -- you just refer to them in the new Scene/Nodes.

The other concept is that of a Zipper. A zipper is a way to "navigate" through a persistent data structure to optimize local changes. For instance, if you added four new Ports instead of just one, but you added each Port one at a time, you'd have to create four new Scenes, and eight new Nodes. With a zipper, you defer such creations until you are done, saving up on those intermediary objects.

The best explanation I ever read about zipper is here.

Now, use of a zipper to navigate a data structure remove the need to have back-links. You can have back-links in an immutable structure, by clever use of recursive constructors. However, such a data structure would not be persistent. Non-persistent immutable data structures have lousy modification performance, because you need to copy the whole data each time.

As for academic literature, I recommend Purely Function Data Structures, by Okasaki (dissertation PDF, fully fledged book).

Daniel
+1 for both mentioning Zippers and Okasaki who, quite literally, wrote the book on this subject. Another interesting concept is Clojure 1.1's *transient* data structure. (Basically, a temporarily non-persistent datastructure.) In fact, Clojure in general is interesting: if Okasaki wrote the book on functional datastructures, Rich Hickey wrote the library. And, BTW: the Clojure datastructures are *specifically* written in such a way that they *can* be used as a Java library. They are completely independent from the Clojure language and the Clojure standard library.
Jörg W Mittag