tags:

views:

84

answers:

2

Let's say that I have a tree of objects of which every one have a string representation. I want to create a SHA1 digest on the whole tree.

The easiest way would be to recursively go over each node of the tree. For each node I would concatenate (as simple strings) the SHA1 digests of all the children, add the string representation of the given nod to this concatenated string, and do a SHA1 on it. This would be the SHA1 digest of the given node.

The question is will this digest be just as "good" as if I would have concatenated the string representation of the child nodes, and not the digests of the child nodes?

Thanks

+1  A: 

This would be better than hashing the concatenation of the children. Consider the following tree:

  • A
    • AA
    • AB

When concatenated, this becomes "AAAB". Contrast with the following tree:

  • A
    • AAA
    • B

Different, but the hash of the concatenation would be the same.

David Grant
+1  A: 

Assume that you can select a Unicode character that is never permitted in the label of a node.

Then, you can use a stream API (like the Java MessageDigest) classes to feed all the labels, in tree-order, inserting the reserved delimiter character in between.

At the end you have one relatively compact digest, without paying for an extra level of SHA calculations.

EDIT

The scheme above isn't quite right, but, then again, neither is the original question, insofar it doesn't encode the structure of the tree. The code needs to construct some sort of ID for each node that is stronger than just the traversal order and include that in the hash input for the node, so that trees with different shapes but the same labels don't hash identically.

The original scheme in the answer works if the tree order is important but the precise structure is not.

bmargulies
Looking at David Grant's answer and yours, I wonder if using the forbidden delimiter character would be enough to disambiguate between different tree structures. I don't think it would. What if there were 2 (or more) forbidden characters, and these were used to encode a counter of when the node is visited in the traversal?
GregS
Nevermind, my scheme won't work either. The problem I have in mind is e.g. a tree with say three nodes each containing the letter 'A', but one tree is a perfect binary and the other is degenerate left-handed. You'd want these to have different hashes, no?
GregS
@GregS I think you are right, let me edit.
bmargulies