tags:

views:

1367

answers:

5

Could any one give an explanation on how a DHT works.

Nothing too heavy, just the basics.

+2  A: 

DHTs provide the same type of interface to the user as a normal hashtable (look up a value by key), but the data is distributed over an arbirtary number of connected nodes. Wikipedia has a good basic introduction that I would essentially be regurgitating if I write more -

http://en.wikipedia.org/wiki/Distributed_hash_table

Pete
A: 

Check out this Wikipedia article Or this powerpoint slide

Vijesh VP
+1  A: 

Memcached is one particular implementation of a DHT used for caching. See how it works here (search for "how memcached works")

Mauricio Scheffer
+8  A: 

Ok, they're fundamentally a pretty simple idea. A DHT gives you a dictionary-like interface, but the nodes are distributed across the network. The trick with DHTs is that the node that gets to store a particular key is found by hashing that key, so in effect your hash-table buckets are now independent nodes in a network.

This gives a lot of fault-tolerance and reliability, and possibly some performance benefit, but it also throws up a lot of headaches. For example, what happens when a node leaves the network, by failing or otherwise? And how do you redistribute keys when a node joins so that the load is roughly balanced. Come to think of it, how do you evenly distribute keys anyhow? And when a node joins, how do you avoid rehashing everything? (Remember you'd have to do this in a normal hash table if you increase the number of buckets).

One example DHT that tackles some of these problems is a logical ring of n nodes, each taking responsibility for 1/n of the keyspace. Once you add a node to the network, it finds a place on the ring to sit between two other nodes, and takes responsibility for some of the keys in its sibling nodes. The beauty of this approach is that none of the other nodes in the ring are affected; only the two sibling nodes have to redistribute keys.

For example, say in a three node ring the first node has keys 0-10, the second 11-20 and the third 21-30. If a fourth node comes along and inserts itself between nodes 3 and 0 (remember, they're in a ring), it can take responsibility for say half of 3's keyspace, so now it deals with 26-30 and node 3 deals with 21-25.

There are many other overlay structures such as this that use content-based routing to find the right node on which to store a key. Locating a key in a ring requires searching round the ring one node at a time (unless you keep a local look-up table, problematic in a DHT of thousands of nodes), which is O(n)-hop routing. Other structures - including augmented rings - guarantee O(log n)-hop routing, and some claim to O(1)-hop routing at the cost of more maintenance.

Read the wikipedia page, and if you really want to know in a bit of depth, check out this coursepage at Harvard which has a pretty comprehensive reading list.

HenryR
+1  A: 

Hi,

Just after coming across this post - Just wondering two things with regard to DHTs:

  1. Is it a bit unrealistic to assume they will always know their key? i.e. to find a file you must have the exact filename e.g. tvProg.txt which you will then hash into a number e.g. 32. If you hashed any variation on this it would not work...Therefore if a user is looking for a type of information as opposed to a specific file name, the DHT won't work - Right?

  2. Also, files exist on certain nodes (This is where they're physically stored). Then according to the DHT the file names are hashed into keys and those keys are stored at the nodes whose ID is closest to them in the DHT (or some other such method depending on the algorithm). My question is: Is the original file then actually moved from its original storage spot and transferred to this new node? If so then why don't DHT research papers consider this when evaluating scalability - surely moving the files onto different node creates alot of overhead - I don't see why its necessary and I presume its a huge problem when the nodes are modile?

Thanks in advance. I'm quite conufsed by these two points - any help appreciated

2. The node, that is responsible for a certain key, stores the appropriate value. That doesn't have to be the file itself. It can also be a reference to the file.Think of an example like this:You have a file you want to offer in the DHT. You hash the file and tell the other node, that is responsible for your key, to store a reference to yourself.If somebody is looking for that file now, he gets the reference from the responsible node and will come back to you, to download the file :-)
Peter Wippermann