tags:

views:

271

answers:

2
+2  Q: 

Data Sharding

I'm interested in sharding my websites user data across multiple servers.

For example, users will login from the same place. but the login script needs to figure out what server that users data resides on. So the login script would query the master registry for that user name, and it might return that it's on server B. The login script would then connect to server B and verify the username/password. Does that make sense? Is it normal to have something like a master registry to resolve where data resides?

also- I've searched but I haven't had much luck finding tutorials/information/strategies on sharding. If there are any online resources that you are aware of on the topic I would greatly appreciate it if you would share so that I may educate myself. Thanks!

+1  A: 

One option you might want to consider: use a simple hash. For example, take the MD5 hash of the username, then treat the last 8 bytes of that as a long. Take that long mod (number of servers) and make that the server to put the data on. That way you don't need any central registry/configuration other than an ordered list of servers.

The disadvantage is that changing the number of servers involves moving all the data to the new "correct" location...

(There's also the matter that if one machine goes down, those users are stuffed - you'll want to consider having some sort of redundancy.)

Jon Skeet
"The disadvantage is that changing the number of servers involves moving all the data to the new "correct" location..." Note that this can be dealt with using Consistent Hashing.
Dana the Sane
+2  A: 

You should check the very informative http://highscalability.com site. Posts worth reading:

Generally you are following the right approach but this can get nasty quite fast if you need to do queries on more than one cluster - e.g. "your firends recent posts" type queries.

mdorseif