views:

490

answers:

4

I am looking for a reasonably well tested library+server to store a persistent distributed hash table.

I am hesistant to use SQL-based solutions as the data is highly document oriented, consisting of millions of ~64KB blobs with only a single index (computed by hash of said BLOB) - and needs to be able to be distributed for long term scaling prospects.

Due to expense and bandwidth considerations, external solutions such as S3 are not an option.

Something like CouchDB or Project Voldemort would be ideal - however there is a noticable lack of .NET bindings for both (PV can be IKVMC'd from Java - however has "issues".). Both key and value are byte arrays (key is 16 byte, the value is up to 2048KB averaging 64KB)

I have searched so far for some kind of .NET port of Dynamo, Chord and similar - however the majority of results appear to be purely in-memory caches and lack any form of persistence or replication.

Anyone got any ideas or suggestions?

+1  A: 

Consider MS Velocity.

Summary: “Velocity” is a distributed in-memory application cache platform for developing scalable, available, and high-performance applications. “Velocity” fuses memory across multiple computers to give a single unified cache view to applications. Applications can store any serializable CLR object without worrying about where the object gets stored. Scalability can be achieved by simply adding more computers on demand. “Velocity” also allows for copies of data to be stored across the cluster, thus protecting data against failures. “Velocity” can be configured to run as a service accessed over the network or can be run embedded with the distributed application.

JasonRShaver
Velocity is an in-memory cache only, AFAIK it lacks any form of long term persistence.
Adam Frisby
+7  A: 

Take a look at Ayende's Rhino DHT. Might be more inline with what you are looking for. The source can be acquired here.

Harry Steinhilber
Ayende has also started a series on document database design http://ayende.com/Blog/archive/2009/03/17/designing-a-document-database-what-next.aspx
David Robbins
+2  A: 

DryadLINQ or Hadoop.Net may help.

Hadoop.Net is dotnet version of Hadoop. More about Hadoop can be found here

Harsha Hulageri
Hadoop.Net appears to not be going anywhere. Nothing is posted on the Google Code site and the SVN tree is at revision 1 with no data.
Joe Doyle
I have been using DryadLINQ for large scale distributed analytics and its very solid. It has a distributed data model, though it is very geared towards iterating over the entire piece of data for analytics. Not so much for fast distributed lookups.
Turbo
+2  A: 

I actually think you should consider SQL Server 2008. Store the data in a table with a varbinary(max) column, along with a column that contains the hash of that column. Index the hash, as you suggested.

You'll then be able to use the various distribution features of the product.

John Saunders