views:

374

answers:

5

I am looking to write a Key/value store (probably in python) mostly just for experience, and because it's something I think that is a very useful product. I have a couple of questions. How, in general, are key/value pairs normally stored in memory and on disk? How would one go about loading the things stored on disk, back into memory? Do key/value stores keep all the key/value pairs in memory at once? or is it read from the disk?

I tried to find some literature on the subject, but didn't get very far and was hoping someone here could help me out.

+3  A: 

If you're doing a key/value store in Python for learning purposes, it might be easiest to start with the pickle module. It's a fast and convenient way to write an arbitrary Python data stream to a persistent store and read it back again.

Greg Hewgill
+3  A: 

you can have a look at 'Berkley db' to see how how it works, it is a key/value DB, so you can use it directly, or as it is open-source see how it handles persistence, transactions and paging of most referred pages.

here are python bindings to it http://www.jcea.es/programacion/pybsddb.htm

Anurag Uniyal
This answer is a what I was coming in here to write. BDB is used everywhere and implements solutions to a large number of storage problems.
z8000
I don't think he'll have much fun reading through the code, though.
Seun Osewa
+2  A: 

Amazon released a document about Dynamo - a highly available key-value storage system. It mostly deals with the scaling issues (how to create a key/value store that runs on a large number of machines), but it also deals with some basics, and generally worth to read.

Ofri Raviv
+7  A: 

It all depends on the level of complexity you want to dive into. Starting with a simple Python dict serialized to a file in a myriad of possible ways (of which pickle is probably the simplest), you can go as far as implementing a complete database system.

Look up redis - it's a key/value store written in C and operating as a server "DB". It has some good documentation and easy to read code, so you can borrow ideas for your Python implementation.

To go even farther, you can read about B-trees.

For your specific questions: above some DB size, you can never keep it all in memory, so you need some robust way of loading data from disk. Also consider whether the store is single-client or multi-client. This has serious consequences for its implementation.

Eli Bendersky
If i could upvote this twice I would. This is the kind of answer I'm looking for. Thank you.
The.Anti.9
well, if you like it so much, you can accept it ;-)
Eli Bendersky
+4  A: 

Have a look at Python's shelve module which provides a persitent dictionary. It basically stores pickles in a database, usually dmb or BSDDB. Looking at how shelve works will give you some insights, and the source code comes with your python distribution.

Another product to look at is Durus. This is an object database, that it uses it's own B-tree implementation for persistence to disk.

mhawke