views:

210

answers:

3

What is the easiest way to simulate a database table with an index in a key value store? The key value store has NO ranged queries and NO ordered keys.

The things I want to simulate (in order of priority):

  1. Create tables
  2. Add columns
  3. Create indexes
  4. Query based on primary key
  5. Query based on arbitrary columns
+2  A: 

Use a hashtable or dictionary. If you want unique key values you could use a GUID or hashcode.

Tony
Do you mean a hastable for the primary key? And where does a dictionary fit in?
Zubair
In fact, I think a hashtable will probably suit you better. Dictionary is just another possible option.
Tony
A: 

The key-value store should support ordering the keys and ranged access to the keys.

Then you should create two dictionaries:

id -> payload

and

col1, id -> NULL

, where payload should contain all the data the database table would contain, and the keys of the second dictionary should contain the values of (col1, id) from each entry of the first dictionary.

Quassnoi
The key value store I am using does not support ordering of keys or ranged access.
Zubair
Also, could you rephrase the answer, I didn't understand the part about dictionaries. Also what is col1?
Zubair
+1  A: 

If you use Redis (an advanced key-value store that supports strings, lists, sets, etc.) Then this is quite easy. I have already developed a C# redis client that has native support for storing POCO's data models. These exact same POCO's can be used by OrmLite to store it in a RDBMS.

By the way Redis is fast, I have a benchmark that stores and retrieves the entire Northwind Database (3202 records) in under 1.2 seconds (running inside a UnitTest on a 3yo iMac).

I store entities in two ways

  • Distinct entities, where I combine the Class type name and Primary Key to create a unique key e.g. urn:user:1
    • I then maintain a separate set of primary keys (in a Redis Set) to keep track of all my entities, using a key like: ids:user
  • In a Redis server side list - which acts very much like a table with support for paging, using a key like: lists:user
mythz
I used Redis in the past, and yes, it is blindingly fast, I loved it, especially all the range, set and atomic (incr and decr) operations it has. Its just that now I am dealing with a system which stores several billion records in a dynamically expandable database (Riak), so Redis isn't quite right for this, as everything has to fit into memory
Zubair
Actually redis trunk includes a VM which will allow you to store 10 million keys (with any sized value) using only 1.6GB memory, while still remaining blazingly fast - see: http://antirez.com/post/redis-virtual-memory-story.html
mythz
Yeah, I have seen that, very cool!! In fact, even Riak has a Redis based backend. It still doesn't scale to billions of records though, but very nice.
Zubair
Isn't billions of records a hardware problem? We've got entry-level dell servers with 64GB RAM. 64GB ~= 400M keys. So 3 servers = 1.2 billion records. Use a client with consistent hashing and you're there.
mythz
Yes that is one possibility. But also once you start running Riak range queries and ordered lists over the cluster you have to perform distributed merges and stuff like that, and the performance goes down 1000 fold (something I have tried). I would recommend trying it though. In another project we did such what you described, but we had nine servers (3 copies of the data). We ended up handcoding all the index and range queries so they worked over the cluster in the end as Redis doesn't support these yet. Looking back we should have used MongoDB for that project
Zubair
I guess that is a question in itself. IF there is a way I can do fast range queries in redis and get fast ordered lists then I could use it!
Zubair
http://stackoverflow.com/questions/2273726/what-is-the-most-efficient-way-to-perform-a-range-query-over-a-cluster-of-nodes
Zubair
It sounds like you may be able to use Redis 'ordered sets' and 'ZRANGEBYSCORE' in order to get a dataset range by score value.
mythz
How will the score help with the merge of the resultsets (or are we talking about different things now)??
Zubair
We might be. If you can compile your query for each value into a 'score' that you store in an 'Ordered Set', you can then make a single call to each Redis instance to get all values that match that score (i.e. your query). A union of these results should contain all values that match your score.
mythz
I'll admit, I did not know about this "score" thing. Is it an algorithm I can read about on the web somewhere, or is ot something you have invented?
Zubair
Nah it's just using a double to order values in a sorted set. For a good explanation see: http://groups.google.com/group/redis-db/browse_thread/thread/a3d9b4743017e90b/07585626443bd275
mythz
I read the thread but still didn't understand it. is there a simple explanation of it anywhere on the web, what should I even be googling, is it "Double Order search"?
Zubair