views:

2823

answers:

6

I have a tree data structure that is L levels deep each node has about N nodes. I want to work-out the total number of nodes in the tree. To do this (I think) I need to know what percentage of the nodes that will have children.

What is the correct term for this ratio of leaf nodes to non-leaf nodes in N?

What is the formula for working out the total number nodes in the three?

Update Someone mention Branching factor in one of the answer but it then disappeared. I think this was the term I was looking for. So shouldn't a formula take the branching factor into account?

Update I should have said an estimate about a hypothetical datastructure, not the exact figure!

A: 

If you know nothing else but the depth of the tree then your only option for working out the total size is to go through and count them.

Mark Pim
Actually he also knows the average, or expected, number of children per node. This plus the height of the tree is enough information to work out the average, or expected, number of nodes in the tree.
j_random_hacker
Yep, that was my mistake - I read the question as needing to know the exact size of the tree.
Mark Pim
+1  A: 

The formula for calculating the amount of nodes in depth L is: (Given that there are N root nodes)

NL

To calculate the number of all nodes one needs to do this for every layer:

for depth in (1..L)
    nodeCount += N ** depth

If there's only 1 root node, subtract 1 from L and add 1 to the total nodes count.

Be aware that if in one node the amount of leaves is different from the average case this can have a big impact on your number. The further up in the tree the more impact.

     *                *                 *         N ** 1
    ***              ***               ***        N ** 2
*** *** ***      *** *** ***       *** *** ***    N ** 3

This is community wiki, so feel free to alter my appalling algebra.

David Grant
What do you mean by "starting with N nodes"?
Georg
@gs: That the first level of the tree has N nodes, rather than 1. Updated accordingly.
David Grant
You're right, though actually the sum you are computing in that loop turns out to be expressible as a simple formula, given in GameCat's answer.
j_random_hacker
A: 

If you work with Nested Sets all you have to do is divide the right value of the root node by 2 and you'll have the number of nodes :)

http://stackoverflow.com/questions/272010/searching-for-the-best-php-nested-sets-class-pear-class-excluded/272248#272248

tharkun
+4  A: 

Ok, each node has about N subnodes and the tree is L levels deep.

With 1 level the tree has 1 node
With 2 levels, the tree has 1+N nodes
With 3 levels, the tree has 1+N+N^2 nodes.

The total number of nodes is (N^L-1) / (N-1).

Ok, just a small example why, it is exponential:

                    [NODE]
                      |
                     /|\
                    / | \
                   /  |  \
                  /   |   \
            [NODE]  [NODE] [NODE]
              |
             /|\
            / | \
Gamecat
Need a correction: [..] and the tree is _L_ (was N) levels deep. +1 anyway :)
Andrea Ambu
+1, but it wouldn't hurt to spell out the algebra showing why 1+N+N^2+...+N^L = (N^L-1)/(N-1).
j_random_hacker
Edit: I made Andrea Ambu's suggested change and also fixed a typo.
j_random_hacker
In fact your solution to the equation is a bit off - it should be simply (N^L)-1 (imagine if N were 2 and L were 3 - the sum is 1+2+4=7=(2^3)-1). This can be shown fairly easily through induction.
HenryR
@HenryR: I derived Gamecat's formula independently. If N=2, then your proposed answer is *also* correct as dividing by (N-1) doesn't change anything in this case. It's when N>2 that you need to perform the division to get a correct answer.
j_random_hacker
@j_random_hacker: right, yes, silly me.
HenryR
+1  A: 

If your tree is approximately full, that is every level has its full complement of children except for the last two, then you have between N^(L-2) and N^(L-1) leaf nodes and between N^(L-1) and N^L nodes total.

If your tree is not full, then knowing the number of leaf nodes doesn't help as a totally unbalanced tree will have one leaf node but arbitrarily many parents.

I wonder how precise your statement 'each node has about N nodes' is - if you know the average branching factor, perhaps you can compute the expected size of the tree.

If you are able to find the ratio of leaves to internal nodes, and you know the average number of children, you can approximate this as (n*ratio)^N = n. This won't give you your answer, but I wonder if someone with better maths than me can figure out a way to interpose L into this equation and give you something soluble.

Still, if you want to know precisely, you must iterate over the structure of the tree and count nodes as you go.

HenryR
+1  A: 

Just to correct a typo in the first answer: the total number of nodes for a tree of depth L is (N^(L+1)-1) / (N-1)... (that is, to the power L+1 rather than just L).

This can be shown as follows. First, take our theorem:

1 + N^1 + N^2 + ... + N^L = (N^(L+1)-1)/(N-1)

Multiply both sides by (N-1):

(N-1)(1 + N^1 + N^2 + ... + N^L) = N^(L+1)-1.

Expand the left side:

N^1 + N^2 + N^3 + ... + N^(L+1) - 1 - N^1 - N^2 - ... - N^L.

All terms N^1 to N^L are cancelled out, which leaves N^(L+1) - 1. This is our right hand side, so the initial equality is true.