views:

1143

answers:

5

Hi,

I am looking for a lightweight open source paging B+ tree implementation that uses a disk file for storing the tree.

So far I have found only memory-based implementations, or something that has dependency on QT (?!) and does not even compile.

Modern C++ is preferred, but C will do too.

I prefer to avoid full embeddable DBMS solution, because: 1) for my needs bare bone index that can use the simplest possible disk file organization is enough, no need for concurrency, atomicity and everything else. 2) I am using this to prototype my own index, and most likely will change some of the algorithms and storage layout. I want to do that with a minimum of effort. It's not going to be production code.

A: 

I' pretty sure it's not the solution you're looking but why don't you store the tree in a file yourself? All you need is an approach for serialization and an if/ofstream.

Basically you could serialize it like that: go to root, write '0' in your file a divider like '|', the number of elements in root and then all root elements. Repeat with '1' for level 1 and so on. As long as you don't change the level keep the level index, empty leafs could look like 2|0.

DaClown
I am not looking to build the tree in a memory and then serialize it. I am looking to build the tree on disk, having only a subset of its nodes in memory at any given time.
Laurynas Biveinis
You're looking for paging a B-tree then. Maybe you could clarify this in your question?
DaClown
It's the first time I hear that it is considered "paging" tree, but here you go.
Laurynas Biveinis
It's not paging tree. What you have is a data structure of which you want to handle a subset in memory but have the whole structure stored on hard drive. Loading subsets of data that is greater than your memory is called paging.
DaClown
+3  A: 

http://people.csail.mit.edu/jaffer/WB.

You can also consider re-using the B-Tree implementations from an open source embeddable database. (BDB, SQLite etc)

Vijay Mathew
@kastauyra Just take the B-Tree implementation, not the entire DBMS
Vijay Mathew
Possible, but last time I looked into it, it is quite messy to rip out the index out of entire DBMS, just too many dependencies.
Laurynas Biveinis
So what B-tree implementations be readily taken from said databases? I have been actively looking and found *none*.
Joe Soul-bringer
@Joe Soul-bringer For SQLite, look at the interface in btree.h and its implementation in btree.c.
Vijay Mathew
+2  A: 

Faircom's C-Tree Plus has been available commercially for over 20 years. Don't work for them etc... FairCom

There is also Berkley DB which was bought by Oracle but is still free from their site.

Tony Lambert
A: 

You could look at Berkeley DB, its supported ny Oracle but it is open source and can be found here.

Jackson
+1  A: 

I second the suggestion for Berkeley DB. I used it before it was bought by Oracle. It is not a full relational database, but just stores key-value pairs. We switched to that after writing our own paging B-Tree implementation. It was a good learning experience, but we kept adding features until is was just a (poorly) implemented version of BDB.

If you want to do it yourself, here is an outline of what we did. We used mmap to map pages into memory. The structure of each page was index based, so with the page start address you could access any element on the page. Then we mapped and unmapped pages as necessary. We were indexing multi GB text files, back when 1 GB of main memory was considered a lot.

KeithB