views:

43

answers:

2

I'm in the process of designing the database schema for an application that in the future could involve spreading some tables over several databases due to the number of entries a single user might make. I'm currently designing table relationships keeping common best practices in mind, but I'm not thinking about server architecture, table partitioning, sharding, master/slave, etc. -- is it necessary to consider those things when creating a schema, or is this one of those situations where I'm thinking too far ahead? The only decision I've made so far is to manage any foreign key constraints at the application level instead, so that I can more easily move a table to a different database.

A: 

Yes you should consider designing for performance for the expected size of the database in production. Little changes in the design can make a huge differnece in how well the databse will scale. Databases are not easy to refactor esepcially once they have millions of records. You should also consider data security and data integrity in the design. Data integrity can only successfully be secured at the database level, to try to do it in the application is foolish and short-sighted.

You also should consider what, if any, meta data you will need. Do you need auditing? Do you need GUIDs for replication? Do you need to know when the record was inserted? Are you going to plan for archiving?

HLGEM
+1  A: 

It sounds like you're still in the logical design phase--maybe an object model would be more convenient. I think you're jumping into a relational design too early. Some data will probably fit nicely into a relational DB with system enforced integrity constraints. Other things might fit better into a RESTful service, memcache or file system cluster. Don't commit to design features if your problem doesn't require them. And don't forget that ACID transactions are a design feature. ;)

There are examples of scaling relational designs. WordPress has a monolithic relational design, but the incredibly sharded WordPress mu that runs wordpress.com shares 99% of the code. Other sites (this one!) support huge user communities on a relational design.

I think it's wrong to assume that you can ignore performance and add it later. Ignore performance and plan to throw it away? That's a reasonable approach.

Ken Fox