views:

45

answers:

2

I'm working on early designs for an application that needs to start out small but be highly scalable. I'm particularly worried about the user database, which in this case will have a high INSERT and UPDATE load, and is unlikely to survive long on a single master.

(Although my question isn't tied to any particular RDBMS, for the record, we'll be using MySQL, and MySQL Cluster doesn't really meet our needs, so we need to roll our own solution on this one using stock MySQL + InnoDB.)

I'm considering a strategy of distributing users amongst MySQL masters based on a hash of their username (plus an unknown-to-user salt just as added insurance against any funny games). I've seen solutions like this used successfully before, but I've never designed/implemented it myself.

What I'd like some input on is:

1) Suitable hashing algorithms. I expect SHA-1 or even MD5 would work just fine for this, since cryptographic security really isn't the goal, but I'm not sure if there might be other algorithms out there that might have desirable properties for this sort of problem. Something a little faster might be nice, too.

2) Any major caveats anyone can think of. (I'm already very conscious of the potential connection pool problem, as well as the fun in adding new masters to the pool and migrating affected users.)

Thanks!

+2  A: 

The problem with a hash based solution is moving users. Consider the following scenario - you have 3 users and 3 servers. User A has a hash that results in their connection being allocated to server A by your software, User B connects to server B, User C connections to server C. What happens if server B goes down, or you want to migrate user B to a new server, D, because server B is overloaded - you can't, because your software is coded to take a hash of the username, and connect to a server based on that has.

Also you'll have the problem of distribution - the hashes for users A, B and C may well resolve to server A, so servers B and C are sitting idle.

Personally I'd replicate the user-database table amongst all of the servers, then randomly connect to a server at startup, find what their actual database server is, and continue at that point. That way you can move users easily and, if you replicate data between at least two servers, you have redundancy should a server fall over.

blowdart
One of my goals is to avoid mass data replication, as I want to avoid the potential for madness when replication doesn't occur in a timely fashion. I'm considering a variant on what you've suggested, though: make the app query a "dedicated" (I'll just run it on the DB server at first when load is low) mapping server -- possibly only running something as simple as memcached. A couple dedicated boxes doing that should be able to scale into the hundreds of thousands of users at _least_.
Nicholas Knight
+2  A: 

First, I feel duty-bound to tell you that you may not need to shard the thing if you think about the design hard enough. Sharding is a last resort. Here's Percona's Baron Swartz talking about that (don't miss the direct link to slides): http://www.percona.tv/performance/baron-schwartz-high-performance-mysql-from-a-boring-architecture-ppc-2009

On to the actual advice.

One thing to consider is how you are going to rebalance your application. You start with 3 nodes; at some point you add a fourth. How do you migrate a quarter of the data? By rehashing every user? Probably a bad idea. One thing to consider is partitioning into multiple numbered schemas, and hashing the schema numbers to the db machine. That way you can migrate by moving just the schemas that need to be touched, and not rehashing all of your data. Since the number of schemas is (magnitudes) greater than the number of machines you can also not rely on hashing to find the db machine, and instead use a static mapping that you can update for migrations. That way you can also go with a non-uniform distribution of schemas per machine, if some of your users are much more active than others and create a skew -- you can manually redistribute the load so that it works better.

You still need to map users to schemas. An interesting approach I've been reading about is not hashing to a bucket, but using 2 hash functions to hash to 2 buckets, and choosing the least-loaded (by # of users, # of records, or some other metric) of the two as your target. This leads to much more even distributions, but incurs overhead of checking both of them when you need to get the data for a user back out. This can be mitigated with caching, but still. Something to think about, though.

You may want to think about having the shards replicated - probably asynchronously, as a background process, for hot backups.

Finally, have you considered alternative technologies? Various BigTable clones, while they do not offer a relational model, have very nice scaling characteristics for both reads and writes. Check out Cassandra and HBase.

SquareCog
Hardware is cheap, both in money and time -- engineering is not. Obviously we won't shard until we have to, but "have to" comes when our user load suddenly doubles and we don't have time to find more ways to cut corners or make our app shardable. The app needs to be ready for it from the start. At this point I'm considering a variant on blowdart's strategy. As for BigTable clones, our model isn't suited to them, and even if it were, I'm not about to trust my data and uptime to Google wannabes yet.
Nicholas Knight