I have an online game where I record lots of gameplay stats. These stats tables get large very quickly, and I have to be careful because simply recording more stats can cause the game's performance to get pretty bad, once the table gets big enough.
My strategy, which isn't a very good one, is to keep the stats tables small. I have an automatic process that creates a new table every 24 hours, preventing the performance from getting too out-of-hand. But my solution is ugly and is a sort of "rotation" of stats tables. I use innodb and have set up a couple of indexes to improve performance, and then I just keep 30 of these tables around (each one is 24 hours, so I'm saving a month of stats). Every 24 hours, my automated process deletes the "stats30" table, then renames all the numbered tables to have a higher number, and then it creates a new, blank table called simply "stats". This is the "live" table, where stats are actively being recorded into.
These tables basically record every transaction between every player and every other player in the game that they interact with, thus an exponential explosion of data. When a new transaction occurs, it checks to see if there is already a row for the transactions between these two players during that day. If there is, it updates the row with changes to their transactions. Otherwise, it creates a new row. A pair of players who interact 1000 times in a day and a pair who interact just a single time will both have just a single row in the table for that day. Each action on the database involves a SELECT and then either an UPDATE or an INSERT, so it is pretty even between reads and writes as currently designed. The reading of data in a larger sense, i.e. for analysis of stats and multiple players is done very infrequently, relative to the single SELECTs, UPDATEs and INSERTs. There are about 150,000 rows created per day.
I know this could be better. I can't easily reduce the amount of data I'm recording, but I'm concerned about 1.performance, and 2.simplicity. I could increase performance even more by having a new table created every 4 hours, for example, but then I have to mess with 180 tables. Conversely, I could make it simpler by just using one table, and then everything comes to a screeching halt.
Note that I do need to update rows in these tables, so I can't use something like the ARCHIVE storage engine, but I only need to INSERT or UPDATE on the "live" stats table.
There is also the minor issue that when the daily rotation process occurs, any queries coming in at that moment may be lost. (If it is in the process of renaming all tables and creating a new one, new entries may fail.) Losing a few inserts is not a big problem, but a solution where this error won't occur, or could be done "atomically" would be better.
Thanks for any ideas that might help! :)