views:

41

answers:

2

I am working on a database that is relatively small. It has a total of 67 tables, with a little over a million records. It is about 254 MB. The application that works with it has been running for about 5 years and the amount of usage has doubled each year. This year we are projected to triple which will just almost double the database in size one season. My questions is, Is it a bad idea to split the database up into multiple database. Say we have 300 clients, it would then make 300 individual databases containing the 67 tables but only data pertaining to that client. There isn't much of a reason for the data to be together besides for internal statistics which can be performed on a different server. We shouldn't become larger than 10,000 clients in its lifetime.

The problems I see which this setup is when we need to make changes to the "master database" schema it would need to replicate the change throughout all of the "slave databases"

Also replication would be a challenge when a new client is added.

The application on the code level is pretty much set up for this type of setup.

Is there anything I am missing? Is this a terrible idea?

The database was created in haste (not by me) with no thought of future in mind, and now it is my responsibility.

There is plenty to be done as far as normalization, field type auditing, sql optimization, indexing, and server tuning. Any Feedback would be greatly appreciated.

A: 

The question that I have is how is the database accessed? Is there one application install per client? If so, then keeping the databases separate can buy you some time when upgrading the application (Since you only need to update the database when you update the application). If they are accessed by one application install, keep them together.

But there are other considerations in mind. You mention that the size today is 1 million rows @ 256 MB. That should be very easily within the reach of a commodity server. So if you expect a growth of worst-case 5 fold each year, you're talking 5 million rows this year, 25 next, 125 the third, 625 the fourth and 3125 million the fifth. Even the 3 billion rows (depending on the exact usage and types of queries) isn't that hard to handle for MySQL (Still within the upper range of a commodity server)...

Plus, if you start running into problems, you could always partition each (or just the major tables) on the client key... It's automatically managed by MySQL for you, so you don't have the maintenance nightmare of managing them yourself...

ircmaxell
it is a web application. changes would then be made to a test client and then deployed.
rizzo0917
Right, but there's a single instance of the web application, right? Meaning only one website accesses all of the data (Rather than creating a different website for each client)...
ircmaxell
A: 

Currently you have a quarter of a gig of data. You envisage it doubling (half a gig) this year. Is it 1997? No it is 2010 and people have gigabytes of data on their phones.

So the question is, what problam are you trying to solve? It cannot be storage, because that is a trivial amount of data. If it's performance then I think splitting into multiple databases is likely to make things worse unles syou're envisaging a server per database. There is an argument for separate databases from a security perspective, but there are different ways of addressing those concerns.

Do you have problems with your current environment? Or at least trends which suggests you may have problems in twelve months time? If no, then just sit tight. If yes, formulate them clearly and then figure out how 300 databases will solve those problems, and whether they will be worth the inevitable grief. Then recalibrate that grief to account for 10000 users and ask the question again.

There may be some questions to which the best answer is "ten thousand databases" but not very many.


"Our biggest client adds about 12000 records per year. "

In other words about one record every ten working minutes (assuming an eight hour day). This doesn't seem like a large write load.

"The idea is rather then a client go through all the data, it just accesses their data."

But it's not a lot of data, and certainly nothing that a decent indexing strategy can't fix.

I still don't understand whether you have an actual real problem now or you're just thinking about something which might be a problem at some point in the future.

APC
The problem is performance. Our biggest client adds about 12000 records per year. Whereas our average client is about 1000 records per year. The idea is rather then a client go through all the data, it just accesses their data. How would that make performance worse as you said it is likely to do.
rizzo0917