database sharding strategy

views:

289

answers:

+1 Q:

database sharding strategy

For an online marketplace product under construction, I have a situation which requires implementing a database sharding solution. I am new to sharding and after reading the posts in this forum I feel a directory based sharding strategy using business entities will be suitable. But I am still not clear about the denormalization and data synchronization best practices to adopt with such a sharded solution. There will be 3 core entities, supplier, customer and order. I am planning to shard the database based on the supplier id as most of the processing on the order data will be carried out by the supplier admins. This will ensure that the orders for a supplier are fetched from a single db instance eliminating cross db fetches. However, in this case, when the customers view their order information that data will be residing in multiple db instances and will require multi database fetching. What is typically done when such scenarios come up in a sharded solution.

+3 A:

I think there is a 99.9% chance that you do not need sharding.

You need sharding if:

Your database insert /update rate is close to, or is exceeding, the capacity of the highest spec server you can cost-effectively buy AND
You are already farming out most of your read queries, reporting, backups etc on to read-only replicated slaves
You have done functional partitioning to move any nonessential or unrelated update-heavy workloads off your master server

If you cannot definitely say "yes" to all three of the above, you do not need to shard.

Read

http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/

MarkR 2009-11-29 20:03:07

Thanks. I completely agree with you. However, assuming I have to shard, what is the appropriate strategy to do so for the given problem. My ballpark estimates shows the DB will be approximately 1 TB in size without historical / past data.

hbagchi 2009-11-30 06:15:43

I don't think anybody can tell you that, nor can you if you haven't got detailed information on exactly which parts of your application are contending with the database the most. If you're sharding you will presumably have exhausted most other avenues. Depending on the access pattern, 1Tb isn't that big, and may still work on 1 box (with relevant failovers etc)

MarkR 2009-11-30 12:49:03

Database Sharding can be extremely effective, even before your database gets to be multiple TBs in size. The main reason we have found is because the ratio of memory/CPU to disk alters markedly, and the DBMS products such as MySQL are really excellent at putting most recently used indexes and data into memory.

For your data sharding problem, this technique may help.

Parallel Query (we call it "Go Fish" queries). With this idea you can query your customer orders from multiple shards at the same time, and consolidate the results. If its done right this can be very efficient.

For data that doesn't change as much we often recommend Global Table replication for common lookup tables, but that will not help much with something as active as Customer Orders.

In any case, sharding can be implemented in a very cost-effective manner, and can scale linearly for writes, and often better than linearly for reads based on the above.

cisaacson 2010-08-21 21:37:21

You may also want to try out nosql DBs such as mongodb or Cassandra

You can also use memcache to cache data for fast access

You can also look into master slave replication with multiple slaves.

Tyuio 2010-08-21 21:45:11

ansaurus

tags:

views:

answers:

database sharding strategy

related questions