views:

315

answers:

3

I am writing a B+ tree for a variety of reasons and I am come here to ask a question about implementation of its nodes. My nodes currently look like:

struct BPlusNode
{
public:
    //holds the list of keys
    keyType **keys;
    //stores the number of slots used
    size_t size;
    //holds the array of pointers to lower nodes NULL if this is a leaf node
    BPlusNode **children;
    //holds the pointer to the next load to the 'left'
    BPlusNode *next;
    //Data page pointers NULL if this is a branch node
    Bucket **pages;
};

As you can see my current implementation is using * * in the place where I am wondering whether I should use * * or *.

I am well aware of the fact that * * requires two dereference operations and thus is a slower than simply using *, however this class uses a great deal of recursion and it is far more convienient to pass pointers to sub calls of recursive functions. To do this with * I would need to do pointer arithmetic and pass the resulting pointer.

With **

someFunction(BPlusNode* currNode)
{
    ......
    someFunction(currNode->children[ChildIndex]);
}



with *

someFunction(BPlusNode* currNode)
{
    ......
    someFunction((currNode->children) + ChildIndex);
}

I can see that there is an additional read of memory to produce the pointer desired in the * * version, but the * * version is also easier to think about for me (it conforms more closely to how I see the diagrams drawn in "The Art of Computer Programming" and on wikipedia).

Does anyone have any thoughts one way or the other? Suggestions for a third option? Proof of why one is superior to the other? etc?

Edit:
I might post this as an answer below but I just realized that with the * * scheme I do not need to copy the entire contents of each subnode or bucket should I want to insert one into the middle of the array (ie resize the array). If there are 20 subnodes for the * scheme when I reallocate the array I would need to copy 20*sizeof(BPlusNode) bytes, as opposed to 20*sizeof(BPlusNode*) bytes for the * * scheme.

On the other hand it has occurred to me that since I perform all the inserts and page splits are done up front maybe this increased efficiency in performing them is unnecessary, and the benefits of * over * * in searches outweighs it.

+1  A: 

Using **, you need an extra allocation step to hold each BPlusNode* child pointer. Or you could allocate a block of them and just have each pointer in children point to sequential BPlusNode* elements inside this block -- but it's still one extra dynamic memory allocation per node creation (and a corresponding extra deallocation step on destruction). So I would absolutely recommend using a single *. If writing

someFunction((currNode->children) + ChildIndex);

hurts you, you can rewrite it as

someFunction(&currNode->children[ChildIndex]);

which I find clearer.

j_random_hacker
But, isn't doing it that way the same as doing the pointer arithmetic, dereferncing the result of the arithmetic, and then retrieving the this pointer of the Node? Also, I only need to do insertions to the tree all in one rush, in fact there will be about 2.557 billion insert calls with the current data set (Though due to page structure only about 255,700 buckets). Since the tree is only of order 256 there is going to be a lot of splitting in that initial rush, so, wouldn't we benefit from the increased ease of resizing the arrays?
James Matta
j_random_hacker
James Matta
Insertions in the middle is one reason to prefer `**`. But I thought that the idea of B-trees was that they only grew to a (fairly small) fixed number of child nodes before splitting -- is that right? If so, and if that threshold is less than a few hundred nodes, I'd bet that it is still faster to just copy the node pointers around when inserting in the middle. `memcpy()` is fast; per-child-insertion dynamic memory allocation is not (and also will more than double your memory requirements, since each allocation requires storage of maybe 4-16 bytes of housekeeping info).
j_random_hacker
Well, as I said before the current data set is about 2.557 billion events, which means that there will be that many insert calls. Fortunately a bucket is just a set of indexes into a file and each bucket holds about 10000 events. So I would wind up with about 255,700 buckets and of course a commensurately smaller number of leaf nodes.
James Matta
In another comment you indicate that `keyType` is a "big, ugly thing" -- in that case, it probably is better to use a 2nd level of indirection (i.e. `**`) to enable efficient insertion into the middle of the array. That is unless `keyType`'s constructor itself does dynamic memory allocation, in which case the actual object size will be small and the insertion logic can do a shallow copy (copy pointers inside of the `keyType` objects instead of the objects themselves).
j_random_hacker
BTW it occurs to me that you appear to be implementing a very simple database here -- have you thought about using a preexisting DB instead of writing your own? They are basically all B+-tree based, performant, thoroughly tested, and will give you much more flexibility for querying.
j_random_hacker
@j_random_hacker: I have found that too many DBs like to duplicate the indexing data. With your own BTree there is only one copy. Of course, some DBs do offer index-only "tables" without the table, but I remember all the ones I found were expensive.
Zan Lynx
+2  A: 

I would define another struct for the key and pointer data. I would commit to using fixed size nodes which should match your on-disk structure. This makes memory mapping the tree a lot easier.

Your BPlusNode struct becomes a handle class that points to these mapped data nodes and synthesizes the things like prev and next pointers by reading the siblings as it descends the tree.

It could look something like the following:

enum BPlusNodeType {
    LEAF, BRANCH
};

struct BPlusNodeData {
    static const size_t max_size = 511; // Try to fit into 4K? 8K?
    uint16_t size;
    uint16_t type;
    keyType key[max_size];
    union {
        Bucket* data[max_size];
        BPlusNodeData* children[max_size];
    };
};
Zan Lynx
+1. A fixed-size, in-place array of node pointers will be faster and cleaner (no dynamic memory (de/re)allocations to forget) than either a `*`- or `**`-based solution.
j_random_hacker
I rather like the elegance here but I am a bit fuzzy on the syntax. I know how a union works in the second context but how does the first union declaration fit in with anything?
James Matta
Also, what impact would there be on how you do this if I told you that keyType is a big ugly thing itself. Unfortunately, keyType is really something I call numericalArray, it is essentially a dynamic array for storing 1 to n dimensional coordinates and then converting those coordinates to a 1D index with a hilbert space filling curve (using some bitwise magic). Does that impact how you would do it?
James Matta
Perhaps you meant enum instead of union when you are defining BPlusNodeType?
James Matta
@James: Yes, I'm sure he meant `enum` there. I suggest -1ing him to get his attention.
j_random_hacker
Waited too long, my up vote cannot be changed. I will ride with it meaning enum for now.
James Matta
I edited it myself. Don't think he'll mind :)
j_random_hacker
Yes thanks, I did mean enum there. That union was screwy.
Zan Lynx
@James Matta: If you have converted to a 1D index key value and that index isn't too long, then I would choose a fixed size for each key and stay with a fixed size array.
Zan Lynx
A: 

Would you be better off using STL 'vector<keyType *> keys' and 'vector<BPlusNode *> children', and so on?

It might be too simplistic, but my impression is that double-indirection is not often needed in C++ (and not all that often in C, though more often than in C++).

Jonathan Leffler
Using the STL adds slows things down yet more. This program has to handle a massive number of reads from the index (and an even more massive number from the disk). I want queries to this tree to be as fast as possible.
James Matta
Finally, usting std::vector<keyType *> etc. makes nodes larger. Vector has some space overhead associated with it, and while that is a bit less of an issue when I only have a couple billion events and thus a couple hundred thousand pages and only a thousand nodes or so this is not so bad (despite keyType taking up substantial room on its own). But when one looks at a situation where there are several hundred billion or, worse, a couple trillion events (amounts that are sometimes necessary) then one finds that as little memory overhead as possible is best.
James Matta
@James: Reading/writing `vector<keyType*>` will be no slower than reading/writing a dynamically-allocated array of `keyType*` -- any decent compiler will generate identical code for both. OTOH the `vector` space overhead is real (though I doubt significant). Normally I'd advocate replacing any fixed-size or dynamically-allocated array with a `vector`, but in this case I think it's appropriate to go for Zan Lynx's fixed-size array approach, which will be both smaller and faster.
j_random_hacker