tags:

views:

110

answers:

3

I heard that B-Tree datbases are faster than Hash tables, so I thought of using a B-Tree Db for my project. Is there any existing framework in python which allows us to use such Data structure or will I have to code from scratch?

+1  A: 

Program what you are trying to do first, then optimize if needed. Period.

EDIT:

http://pypi.python.org/pypi/blist

Drop in replacement for python's built in list.

Technically this is a part of my program , i dont want to use conventional DB like MySQL .. and i have been told to keep in mind that data insertion will be in large sets
Rahul
So the constant lookup/access time offered by hash tables is not fast enough for what you are doing and you are looking towards b-trees to speed things up? I suggest reading about b-trees and hashes before asking questions about them.
well i did some basic literature survey and came across this http://www.igvita.com/2009/02/13/tokyo-cabinet-beyond-key-value-store/the statistics mentioned embolden me to go for B-Trees , unfortunately there isnt a python implementation of program.
Rahul
The scaling considerations of tokyo cabinet do not apply to your project. even if you need to scale out to hundreds of nodes, you will need to optimize for your own case.
TokenMacGuy
...and that article shows better times for hashes than b-trees in every instance.
A: 

SQLite3 uses B+ Trees internally, but it sounds like you may want a key-value store. Try Berkeley DB for that. If you don't need transactions, try HDF5. If you want a distributed key-value store, there is also http://scalien.com/keyspace/, but that is a server-client type system which would open up all sorts of NoSQL key-value stores.

All of these systems will be O(log(n)) for insertion and retrieval, so they will probably be slower than the hash tables that you're currently using.

Kyoto Cabinet offers a hash tree, so that may be more of what you're looking at since it should be O(1) for insertion and retrieval, but you can't do in-order traversal if you need that (although since you're currently using hash trees, this shouldn't be an issue).

http://fallabs.com/kyotocabinet/

If you're looking for performance, you will need to have the speed critical items implemented in a compiled language and then have a wrapper API in Python.

Eric Holmberg
+1  A: 

The only reason to choose a B-Tree over a hash table, either in memory or with block storage (as in a database) is to support queries other than equal. A b-tree permits you perform range queries with good performance. Many key-value stores (such as berkley db) don't make this externally visible, though, because they still hash the keys, but this still lets you iterate over the whole dataset quickly and stably (iterators remain valid even if there are adds or deletes, or the tree must be rebalanced).

If you don't need range queries, and you don't need concurrent iteration, then you don't need b-trees, use a hash table, it will be faster at any scale.

TokenMacGuy