views:

91

answers:

4

How do i create a database for scalability? I am in the middle of http://www.slideshare.net/vishnu/livejournals-backend-a-history-of-scaling which i cant read ATM and need to leave. But i would like to know more about creating a database that scales well. Somethings that it mentioned and occur in my mind are

  • Separate handles for reads and writes?
  • What happens when one server is busy (IO or CPU bound) and i need two servers to write to?
  • Do i create multiple database? have a clusterId on users?
  • Will it be a problem when moving users to one cluster to another?
  • Might i code this so user ABC in DB A on cluster A and DEF in DB B in cluster B have the same PRIMARY KEY?
  • When i move the above to cluster C? Does this mean i need to write much code to move them to another cluster/database?
  • To make the above not an issue would i NOT use PRIMARY KEY and set the ID by hand by reading the other DBs on other clusters?

etc

+5  A: 

To create a database that scales well for 99.9% of use cases, don't bother with any of that stuff. Instead, design a properly normalised schema; use primary, foreign key and other constraints to ensure integrity; index tables well. Study your DBMS vendor's advice on performance and scalability topics such as partitioning, different table and index structures etc. and use what works best for your case (benchmark options to prove that they improve scalability).

Of course, if you work for Google, Ebay or Amazon then you may fall into the 0.1% camp that needs to throw away the rule book and do all this crazy stuff you are reading about. But I'm guessing you don't, right?

Tony Andrews
+1  A: 

To add to Tony's advice, I would say that partition your databases correctly into catalogues (the SQL Server term for a virtual databases namespace inside a physical database server), and try to minimise the dependencies between catalogues -- i.e., the query level dependencies. If there are dependancies make sure they are read-only.

This will allow you to move catalogues to different physical servers when needed. The requirement for read-only is so that if you move a catalogue away from a certain server on which it has a read-only dependancy on another catalogue (on the same physical server), you can go on to replicate the data in question to a read-only catalogue on the new physical server to which you are moving a certain catalogue.

The read-only requirement is present because replication is generally a one-way feature. That means that you can only have one server as a write-master and other servers just receive the data for the purpose of reading from it locally.

The advice about replication is really usefull for worst case scenario and only for doing once. It is not a solution for ad-hoc database growth. You should move away from RDBMS if you ever have to grow this way. With the correct data models replication free movement of catalogue's is possible

Hassan Syed
+1  A: 

What happens when one server is busy (IO or CPU bound) and i need two servers to write to?

If you are doing a distributed transaction, well you are in trouble so you have to plan ahead to make sure load across your distributed transaction target servers is uniform.

Do i create multiple database? have a clusterId on users?

This is a very nice solution :P. You have to get the shared-data data models correct so that you don't form a bottleneck on your shared catalogue's

Will it be a problem when moving users to one cluster to another?

No, distributed transactions for the win. You need to have a kickass programmer to make sure things happen correctly.

Might i code this so user ABC in DB A on cluster A and DEF in DB B in cluster B have the same PRIMARY KEY?

No, assign the primary key on a master RDBMS/LDAP server. You do not want primary-key collisions of this sort. Your chosen method depends on this being done correctly -- you want globally unique user-id's. You will have shared-data in this case, and if you do not have have GU-PK's how will you relate the user's to the shared data ?

Hassan Syed
+2  A: 

RDBMS are great for keeping consistant and transactional data, but they require lots of expert planning to scale to 100's of thousands of transactions per second. I would build a nosql cloud to dump documents built from a RDBMS into.

So you use a RDBMS for the raw data and the nosql databases for the views on the RDBMS'

Hassan Syed
I totally cannot resist: http://images.zomb.es/?v=faulttoler.png
gnucom