My work currently uses MySQL (MyISAM) exclusively for all data storage. We currently have over 300 webservers and about 150 databases. Unfortunately I'm in a position to have to write a table structure to support over 100 million rows in a 30 day period. The idea is:
- High volume inserts (no updates or deletes and always at the end of the table)
- 1 row selects
- Data older than 30 days is thrown away
The best solution seems to be to have a table for each day combined into a Merge table for the selects. There will be duplicate data indeed, but the SELECT will only pull the most recent row based on a timestamp and an int field. Obviously having 30 tables isn't ideal but so goes life.
Are there any inherent flaws with this approach? Are there any other ways to approach this that I am missing (we are stuck at 5.0)? Will table locking be a huge issue when performing the ALTER TABLE on the merge table when the new day's table is created? We currently have a table rotation structure in place but if we go with a single table having to select the data we want out of the old table into the new one would be quite slow as it approaches 100 million rows.
There are other technologies out there to accomplish this in an elegant manner, but our sales team already sold the solution and we don't have the luxury of time.
Any input would be appreciated.
Structure:
CREATE TABLE `merge_test_1` (
`date_stamp` long NOT NULL,
`hash` char(32) NOT NULL,
`p_id` mediumint(8) unsigned NOT NULL,
`a_id` mediumint(8) unsigned NOT NULL,
`b_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`hash`,`p_id`,`date_stamp`)
) ENGINE=MyISAM
Query example
SELECT b_id,a_id FROM merge_test WHERE hash='1' AND p_id=1
ORDER BY date_stamp DESC LIMIT 1