ansaurus

Question

Answer 1

+1 A:

We can make a B-tree in one pass, but it may not be the optimal storage method. Depending on how often you make sequential queries vs random access ones, it may be better to store it in sequence and use binary search to service a random access query.

That said: assume that each record in your b-tree holds (m - 1) keys (m > 2, the binary case is a bit different). We want all the leaves on the same level and all the internal nodes to have at least (m - 1) / 2 keys. We know that a full b-tree of height k has (m^k - 1) keys. Assume that we have n keys total to store. Let k be the smallest integer such that m^k - 1 > n. Now if 2 m^(k - 1) - 1 < n we can completely fill up the inner nodes, and distribute the rest of the keys evenly to the leaf nodes, each leaf node getting either the floor or ceiling of (n + 1 - m^(k - 1))/m^(k - 1) keys. If we cannot do that then we know that we have enough to fill all of the nodes at depth k - 1 at least halfway and store one key in each of the leaves.

Once we have decided the shape of our tree, we need only do an inorder traversal of the tree sequentially dropping keys into position as we go.

deinst 2010-08-03 21:53:26

Actually, if you want to be completely precise B-tree (leaf size n > 2) will allow you to find the starting point for the sequential read faster (as log2 x > logn x, for n > 2). Of course this would only be noticed in a very special case: large number of records and small sequential read (for example: paging range search results found from large recordset).

Unreason 2010-08-04 08:36:02

Answer 2

A:

Optimal meaning that an inorder traversal of the data will always be seeking forward through the file (or mmaped region), and a random lookup is done in a minimal number of seeks.

cjx 2010-08-03 23:52:42

Answer 3

+2 A:

Constructing a "B+ tree" to those specifications is simple.

Choose your branching factor k.
Write the sorted data to a file. This is the leaf level.
To construct the next highest level, scan the current level and write out every k^th item.
Stop when the current level has k items or fewer.

Example with k = 2:

0 1|2 3|4 5|6 7|8 9
0   2  |4   6  |8
0       4      |8
0               8

Now let's look for 5. Use binary search to find the last number less than or equal to 5 in the top level, or 0. Look at the interval in the next lowest level corresponding to 0:

0       4

Now 4:

        4   6

Now 4 again:

4 5

Found it. In general, the j^th item corresponds to items jk though (j+1)k-1 at the next level. You can also scan the leaf level linearly.

2010-08-04 06:39:06

ansaurus

tags:

views:

answers:

Sequentially Constructing Full B-Trees

related questions