Store hierarchies in a way that is resistant to corruption

A:

I was thinking today about the best way to store a hierarchical set of nodes

So you are writing a filesystem? ;-)

The most obvious way to represent this (to me at least) would be for each node to have nextSibling and childNode pointers

Why? The sibling information is present at the parent node, so all you need is a pointer back to the parent. A doubly linked-list, so to speak.

What are ways that you could store a hierarchy of nodes that requires few changes to add or delete a node and is resilient to corruption of a few nodes?

There are actually two different questions involved here.

Is the data corrupt?
How do I fix corrupt data (aka self healing systems)?

Answers to these two questions will determine the exact nature of the solution.

Data Corruption

If your only aim is to know if your data is good or not, store a hash digest of child node information with the parent.

Self Healing Structures

Any self healing structure will need the following information:

Is there a corruption? (See above)
Where is the corruption?
Can it be healed?

Different algorithms exist to fix data with varying degree of effectiveness. The root idea is to introduce redundancy. Reconstruction depends on your degree of redundancy. Since the best guarantees are given by the most redundant systems -- you'll have to choose.

I believe there is some scope of narrowing down your question to a point where we can start discussing individual bits and pieces of the puzzle.

dirkgently 2009-03-08 16:50:52

A:

When you refer to corruption, are you talking about RAM or some other storage? Perhaps during transmission over some medium?

In any case, when you are dealing with data corruption you are talking about an entire field of computer science that deals with error detection and correction.

When you talk about losing a node, the first thing you have to figure out is 'how do I know I've lost a node?', that's error detection.

As for the problem of protecting data from corruption, pretty much the only way to do this is with redundancy. The amount of redundancy is determined by what limit you want to put on the degree of corruption you would like to be able to recover from. You can't really protect yourself from this with a clever structure design as you are just as likely to suffer corruption to the critical 'clever' part of your structure :)

The ever-wise wikipedia is a good place to start: Error detection and correction

Arnold Spence 2009-03-08 17:00:37

A:

A simple option is to store reference to root node in every node - this way it is easy to detect orphan nodes.

Another interesting option is to store hierarchy information as a descendants (transitive closure) table. So for 1.2.3 node you'd have the following relations:

1., 1.2.3. - root node is ascendant of 1.2.3.

1.2., 1.2.3. - 1.2. node is ascendant of 1.2.3.

1., 1.2. - root node is ascendant of 1.2. etc...

This table can be more resistant to errors as it holds some redundant info.

Goran

Goran 2009-03-08 19:44:30

A:

The typical method to store a hierarchy, is to have a ParentNode property/field in every node. For root the ParentNode is null, for all other nodes it has a value. This does mean that the tree may lose entire branches, but in memory that seems unlikely, and in a DB you can guard against that using constraints.

This approach doesn't directly support the finding of all siblings, if that is a requirement I'd add another property/field for depth, root has depth 0, all nodes below root have depth 1, and so on. All nodes with the same depth are siblings.

MrTelly 2009-03-09 09:19:13

ansaurus

tags:

views:

answers:

Store hierarchies in a way that is resistant to corruption

related questions