views:

52

answers:

4

Hi everybody,

I did just come up with the following idea, but I lack the knowledge to say if it is applicable to a production application.

We have a web application, built on PHP/mySQL to make it easy. A table in the database are prone to grow large - a couple million of records easily, so table sharding might be an option here.

Here's how I have imagined the process to work:

A cached file contains the a list with the available tables in the database. Each table contains a maximum of a million rows and when that is reached, the cached list is recreated after a new table has been constructed.

Obviously it wouldn't be a good idea to check the number of rows on every write to the table, so this could be done on a set interval, like a week, or daily - depending on how quick every million of data is created.

Would this be a good way to deal with large amount of data and to keep index sizes fairly low?

Thanks

A: 

With all honesty, I don't think that would be a great idea. You should look into possibly archiving old data or going to a NoSQL solution like MOngo.

Mitch C
+1 I do think that re-inventing the wheel and having a query cross-over multiple tables like that (in the end it can run on thousands of tables) will be a PITA in the long run.
Frankie
+2  A: 

I would probably go with Apache Cassandra.

You are building a site in php/MySQL to make it easy and then you'll be re-inventing the wheel on a extremely big problem.

Don't mess with the data. Go for a proven solution.

Frankie
+1  A: 

You should use horizontal partitioning, partition the table by number of records, lets say every partition will have a million records, that way mysql will internally be handling the partitioning, and besides instead of one big index, the indexes would be partitioned as well.

Read more here http://dev.mysql.com/tech-resources/articles/performance-partitioning.html

ovais.tariq
A: 

The performance of indexes does not degrade linearly with the size of the table. Tables have to be seriously massive before that becomes an issue. If you are seeing performance problems, I'd start doing mysql 'explains' and making sure all your queries are doing the least amount of row scans they can do. You might be suprised at what the actual bottleneck ends up being.

So, basically, if you need the data, I wouldnt go messing around with it. On the other hand, if its something like session data, just delete the rows that are too old.

GrandmasterB