views:

76

answers:

3

If I have a sorted set of data, which I want to store on disk in a way that is optimal for both reading sequentially and doing random lookups on, it seems that a B-Tree (or one of the variants is a good choice ... presuming this data-set does not all fit in RAM).

The question is can a full B-Tree be constructed from a sorted set of data without doing any page splits? So that the sorted data can be written to disk sequentially.

+1  A: 

We can make a B-tree in one pass, but it may not be the optimal storage method. Depending on how often you make sequential queries vs random access ones, it may be better to store it in sequence and use binary search to service a random access query.

That said: assume that each record in your b-tree holds (m - 1) keys (m > 2, the binary case is a bit different). We want all the leaves on the same level and all the internal nodes to have at least (m - 1) / 2 keys. We know that a full b-tree of height k has (m^k - 1) keys. Assume that we have n keys total to store. Let k be the smallest integer such that m^k - 1 > n. Now if 2 m^(k - 1) - 1 < n we can completely fill up the inner nodes, and distribute the rest of the keys evenly to the leaf nodes, each leaf node getting either the floor or ceiling of (n + 1 - m^(k - 1))/m^(k - 1) keys. If we cannot do that then we know that we have enough to fill all of the nodes at depth k - 1 at least halfway and store one key in each of the leaves.

Once we have decided the shape of our tree, we need only do an inorder traversal of the tree sequentially dropping keys into position as we go.

deinst
Actually, if you want to be completely precise B-tree (leaf size n > 2) will allow you to find the starting point for the sequential read faster (as log2 x > logn x, for n > 2). Of course this would only be noticed in a very special case: large number of records and small sequential read (for example: paging range search results found from large recordset).
Unreason
A: 

Optimal meaning that an inorder traversal of the data will always be seeking forward through the file (or mmaped region), and a random lookup is done in a minimal number of seeks.

cjx
+2  A: 

Constructing a "B+ tree" to those specifications is simple.

  1. Choose your branching factor k.
  2. Write the sorted data to a file. This is the leaf level.
  3. To construct the next highest level, scan the current level and write out every kth item.
  4. Stop when the current level has k items or fewer.

Example with k = 2:

0 1|2 3|4 5|6 7|8 9
0   2  |4   6  |8
0       4      |8
0               8

Now let's look for 5. Use binary search to find the last number less than or equal to 5 in the top level, or 0. Look at the interval in the next lowest level corresponding to 0:

0       4

Now 4:

        4   6

Now 4 again:

        4 5

Found it. In general, the jth item corresponds to items jk though (j+1)k-1 at the next level. You can also scan the leaf level linearly.