Hi all,
I'm a lone developer for a telecoms company, and am after some database design advice from anyone with a bit of time to answer.
I am inserting into one table ~2 million rows each day, these tables then get archived and compressed on a monthly basis. Each monthly table contains ~15,000,000 rows. Although this is increasing month on month.
For every insert I do above I am combining the data from rows which belong together and creating another "correlated" table. This table is currently not being archived, as I need to make sure I never miss an update to the correlated table. (Hope that makes sense) Although in general this information should remain fairly static after a couple of days of processing.
All of the above is working perfectly. However my company now wishes to perform some stats against this data, and these tables are getting too large to provide the results in what would be deemed a reasonable time. Even with the appropriate indexes set.
So I guess after all the above my question is quite simple. Should I write a script which groups the data from my correlated table into smaller tables. Or should I store the queries result sets in something like memcache? I'm already using mysqls cache, but due to having limited control over how long the data is stored for, it's not working ideally.
The main advantages I can see of using something like memcache:
- No blocking on my correlated table after the query has been cashed.
- Greater flexibility of sharing the collected data between the backend collector and front end processor. (i.e custom reports could be written in the backend and the results of these stored in the cache under a key which then gets shared with anyone who would want to see the data of this report)
- Redundancy and scalability if we start sharing this data with a large amount of customers.
The main disadvantages I can see of using something like memcache:
- Data is not persistent if machine is rebooted / cache is flushed.
The main advantages of using MySql
- Persistent data.
- Less code changes (although adding something like memcache is trivial anyway)
The main disadvantages of using MySql
- Have to define table templates every time I want to store provide a new set of grouped data.
- Have to write a program which loops through the correlated data and fills these new tables.
- Potentially will still grow slower as the data continues to be filled.
Apologies for quite a long question. It's helped me to write down these thoughts here anyway, and any advice/help/experience with dealing with this sort of problem would be greatly appreciated.
Many thanks.
Alan