There are two binary trees T1 and T2 which store character data, duplicates allowed.
How can I find whether T2 is a subtree of T1 ? .
T1 has millions of nodes and T2 has hundreds of nodes.
views:
690answers:
7Traverse T1. If the current node is equal to the root node of T2, traverse both of the trees (T2 and the current subtree of T1) at the same time. Compare the current node. If they are always equal, T2 is a subtree of T1.
If your trees are not sorted in any way, I don't see any other way than to do a brute-force search: walk through tree T1
and check to see if you reach a node which matches the first node of tree T2
. If not, continue traversing T1
. If so, check if the next nodes match, until you find the end of T2
, in which case you have a hit: your tree T2
is indeed a subtree of T1
.
If you know the depth of every single node of T1
(from leaf to node), you could skip any nodes which are not as deep as the subtree you are looking for. This could help you eliminate a lot of needless comparisons. Say that T1
and T2
are well balanced, then tree T1
will have a total depth of 20 (2**20
> 1,000,000
) and tree T2
will have a depth of 7 (2**7
> 100
). You'll just have to walk the 13 first layers of T1
(8192 nodes -- or is that 14 layers and 16384 nodes?) and will be able to skip about 90% of T1
...
However, if by subtree you mean that leaf nodes of T2
are also leaf nodes of T1
, then you could do a first traversal of T1
and compute the depth of every node (distance from leaf to node) and then only check the subtrees which have the same depth as T2
.
I suggest an algorithm, that uses preprocessing:
1) Pre-process the T1 tree, keeping the list of all possible subtree roots (the cache list will have a millions of entries);
2) Sort the cached list of roots by the descending order of data, kept in root. You may choose more elegant sorting function, for example, parse a character tree into string.
3) Use binary search to locate the necessary sub-tree. You may quickly reject subtrees, with the number of nodes, not equal to the T2 number of nodes (or with the different depth).
Note that steps 1) and 2) must be done only once, then you can test many sub-tree candidates, using the same preprocessed result.
I am not sure, whether my idea is correct. Nevertheless, for your persual.
- Conduct a postorder walk in Tree 1 & Tree 2, call it P1 and P2.
- Compare P1 & P2. If they are different. The tree is not subtree. Exit.
- If they are same, or P1 contained in P2. You can decide which one is sub tree.
Hi,
How much duplication is in T1? Is T1 sorted or does it exist some kind of index on it?
Depending of the answers you can choose better methods...
Regards.
If you are memory/storage bound (i.e. can't pre-manipulate and store the trees in alternate forms), you probably won't be able to do anything better than the brute force search suggested by some other answers (traverse P1 looking for matching root of P2, traverse both to determine whether the node is in-fact the root of a matching sub-tree, continue with original traversal if it isn't a match). This search operates in O(n * m) where n is the size of P1 and m is the size of P2. With depth checks and other potential optimizations depending on the tree data you have available, this man be optimized a bit, but it's still generally O(n * m). Depending on your specific circumstances, this may be the only reasonable approach.
If you're not memory/storage bound and don't mind a bit of complexity, I believe this could be improved to O(n + m) by reducing to a longest common substring problem with the help of a generalized suffix tree. Some discussion on this for a similar problem can be found here. Maybe when I have more time, I'll come back and edit with more specifics on an implementation.