views:

35

answers:

2

Say I have a MySQL table:

CREATE TABLE tweets (
tweet_id INT NOT NULL AUTO_INCREMENT,
author_id INT NOT NULL,
text CHAR(140) NOT NULL,
PRIMARY KEY (tweet_id)
)
PARTITION BY HASH(tweet_id)
PARTITIONS 12;

All is good. The table lives on a single server - Server1. But eventually I may want to scale out. So I'd want to shard the table and move 6 of the 12 partitions onto a new server - Server2.

I'd want:

  • Server1 to contain odd-numbered tweets: partitions 1, 3, 5, 7, 9, 11
  • Server2 to contain even-numbered tweets: partitions 2, 4, 6, 8, 10, 0

1) What is the best way to move those partitions from Server1 to Server2? I need to make sure the values of the auto-increment tweet_id's remain unchanged during the migration.

2) Now that I have 2 servers, how do I make sure the auto-increment tweet_id's generated by the 2 servers don't have the same value? I'd also need to make sure the tweet_id on each partition stays consistent, i.e. on Partition k every tweet_id's modulo 12 equals to k.

3) Ideally I'd like to continue this scale out process. So later on I'd want to add a 3rd server - Server3. I'd want to re-balance the partitions so that there're 4 partitions on each server. Again how do I make sure the auto-increment tweet_id's generated by the 3 servers are distinct and that the modulo 12 of tweet_id's stay consistent within each partition?

+2  A: 

First of all, I would suggest not using AUTO_INCREMENT for tweet_id. The Twitter API gives you an ID with the tweet which is already guaranteed to be unique. You can also use this to reference the tweet via the API later if you choose. However, it sounds like it may be too late for that if you already have a lot of data collected.

Look at the auto_increment_offset and auto_increment_increment system variables. You can use those to ensure your autoincrement IDs don't conflict with each other. Basically, you want to set auto_increment_offset to a number greater than all existing IDs, but set it one higher on the second server. Then, set auto_increment_increment to 2. This will ensure that one server generates all odd IDs and the other generates all even IDs. To keep scaling up, just adjust these values accordingly.

Generally speaking, the partitions features in MySQL aren't designed for scaling out. Your application will need to handle the logic of querying multiple servers if you need to look across partitions.

Your best bet to split up the data is to select ranges of tweet IDs to put on each server. It probably makes sense in your case to grab the first half or so of tweet IDs and put them on server 2. Then server 1 can stay live until server 2 (and your new application logic) are ready to go).

Michael Mior
A: 

You might want to take a look at dbShards, which handles these issues for you. Auto increment is supported with unique values across all shards and you can use modulus to map keys to virtual shards rather than tying them directly to physical shards. This makes it easier to add new shards. You can read more at http://www.dbshards.com/dbshards/.

Regards,

Andy.

Andy Grove