views:

68

answers:

1

I ran across a comment that made me wonder: If you use a sharding approach to db scalability, does that mean the database is denormalized? Can you have a normalized, sharded database?

+3  A: 

The are not mutually exclusive. Both are often used when scaling massive datasets, but one doesn't really have much to do with the other. You can absolutely have a sharded, normalized database...or a denormalized, nonsharded database.

In sharding, you're just taking a given schema (normalized or not) and distributing it across a number of physical/logical data stores. This allows, for example, you to have all your users with a particular characteristic (e.g., last name in 'A-D') to live on a given database instance. Note that HOW you shard your database is a crucial decision and will be highly implementation dependent.

Denormalization, on the other hand, can be done with or without a sharded database and is intended to simply queries by reducing the joins/subqueries needed to answer a particular question. Of course, then you would typically programmatically maintain data integrity.

Some good reading:

Sharding theory & practice

Some highly-scalable database implementations 'in the wild'

DarkSquid