tags:

views:

137

answers:

3

I have a US company database which is 20 millions records. Firstly they is no budget for a massive RAM database server. So I think I am going to have to split the db into parts, 4 parts grouped by State.

My question is how is the best way to handle this with PHP, I am thinking get the users query find the State and then point to the relevant db? Any thoughts?

+1  A: 

sounds like you might want to consider sharding.

Not sure if you are using an ORM for data access, but some of them support sharding. Some info on sharding for php and mySQL here:

http://highscalability.com/database-sharding-netlog-mysql-and-php

just realised - link missing to the actual article in last url... try here: http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/

Paul
+5  A: 

I think you need to look at the MySQL partitioning

silent
Ok I am new to this, how do I create the partition by State? If I can do this I think it will solve the problem
John Jones
A: 

You don't need PHP for all these operations. Maybe to generate SQL code. It's better to make SQL scripts that copy data from the original tables into the new. See "INSERT ... SELECT ..." and "CREATE TABLE ... AS SELECT ..." if you are not familiar with them yet.

If you have MySQL >=5.1, then try partitioning table so that any request hits only 1 partition.

  • If users need information only on 1 state, partition it by state. There can be a lot of partitions without overhead work for you. If users can see only a certain time frame, like month graphs in Webalizer, partition by months, and so on.

Also consider creating aggregate tables. Let me elaborate: in data warehouses there is a distinction between metrics and attributes.

  • An attribute is a column that tells where, when, what, what kind of.
  • A metric tells how much, how many.

An aggregate table has less level of detail: either less attributes (no geographical info, or no product info), or some steps upper the attributes in the full table (just state instead of city+state, year-month instead of date, and so on).

And the last: make sure your users really need the detailed old data. Some of the data becomes irrelevant in a couple of years. For instance, website referrers have no meaning after 1,5-2 years, since most of the websites change. The 2-years-old website traffic data can be just a number of daily/monthly graphs.

culebrón