I have two tree structures that represent snapshots of a directory structure at two different points in time. Directories may have been added, removed or modified between the snapshots. I need to walk the two trees simultaneously and mark the newer with the differences between the two - i.e. flag nodes as New, Modified, Deleted, Unchanged, adding any deleted nodes, so that the end result is the full superset of the two snapshots.
Typically, the trees are likely to be about 10 deep but very wide, containing hundreds of thousands, potentially millions of nodes. I want to skip large chunks of the trees by comparing hash codes at each node and only continuing to recurse where the codes don't match.
Is there an algorithm that could be my friend here? Any other advice?