views:

60

answers:

3

I have been playing with the very cool btree applet at slady.net. I'm having trouble understanding a particular behavior. Take a look at this starting state:

alt text

This particular state was arrived at by inserting the following sequence: 10, 15, 30, 16, 70, 1, 9, 27, 45, 50, 55.

My question is regarding what happens to the [45, ] node when I insert the next value in the sequence, 65.

alt text

The [55,70] node will be split by the 65, and being the middle value, the 65 will travel back up and then split the [30,50] node as well. My question is: Why does the [45, ] node end up a child of the [30, ] node? It's parent originally had 3 children, the left most and the right most became new seperate nodes. The 45 was between those values and it seems like it could have ended up under the [65, ] node just as well... Why?

+2  A: 

It makes no sense for the 45 node to be a child of the 65 node in the second diagram as the rightmost branch is for values > 50 (based on the root node's right-most value) - so 45 has to go into the middle branch somewhere.

Will A
Yeah, I agree... and when I did it by hand, I put it in the right spot. The problem is "makes no sense" is hard to code! Everything else so far has been simple rules (split full nodes, splits go up, etc)...
dicroce
Have you looked into the B-tree insertion rules? I'd recommend looking into some descriptions of B-tree rules rather than deriving logic from an applet.
Will A
@dicroce - the basic fact is that btrees are fiddly. They get worse when, with larger nodes, you want to delay node-splits and node-merges by redistributing keys between adjacent sibling nodes, which also means the key in the parent node *between* the two siblings will change too. I find it helps to think more in terms of an ordered array of all items, with brackets to delimit nodes, such as [[9] 15 [30] 50 [65]] (top two layers of your after example) - but only to a point.
Steve314
@diocre - B+ trees are easier. All items are in the leaf nodes, which are all at the same depth. Branch nodes only hold "subtree summaries" - such as the standard "lowest key in subtree" to support key-based search. The leaf nodes typically form a list, so you don't have to touch the branch nodes at all for sequential access. And splits, merges and redistributions are a little easier because the items are all in leaf nodes - propogating keys and other summary updates upwards through the branch nodes is relatively simple. Of course branch nodes do split/merge/etc still.
Steve314
+1  A: 

Each node always has n+1 children, where n is the number of keys in that node.

Before splitting, the [30, 50] node has two keys and three children, as expected. When you split that, you end up with a [30, -] node and a [65, -] node (and push the 50 up a level).

At the next level down, you have the (previously existing) [16, 27] and [45, -], and the newly-split [55, -] and [70, -] nodes.

You have two parent nodes and four children. Each parent must have two children because it has a single key. Therefore, given the ordering rules, [45, -] must be a child of [30, -] because otherwise (1) [30, -] wouldn't have enough children, and (2) [65, -] would have too many children. [EDIT - not illegally too many for a node, but the split is supposed to be balanced].

Will A is also correct. This is a consequence of choosing to push up the 50 key when splitting the middle-layer node, but this wasn't really a choice. Splitting to either produce [-, -] and [50, 65] pushing up 30, or [30, 50] and [-, -] pushing up 65 would violate the rule that every node must be at least half full.

Steve314
+5  A: 

A picture is worth a thousand words; an animation is worth a million:

The key thing to notice here is that when the center node 50 is pulled up, it has to throw away its right child because it's too far down. However, 65 needs a new left child, so 50 hands 45 over to 65. 50 now needs a new right child, and the node containing 65 needs to be childed, so it takes the newly formed 65 in its place.

Here are illustrated B-tree insertion rules (where maximum node size is 4 items, 5 child nodes):

The xr won't exist and won't matter if you're inserting into a leaf (which you do first). However, if you have to split a node in half, the new x is the center item you pulled out, and the new xr is the right child of x.

Joey Adams
Hmm, apparently, it seems your animation was worth about 69 words. ;P +1, good answer. ;)
jrista
Wow, you actually made an animation for my case? I wish I could upvote more than once!
dicroce