views:

24

answers:

2

My work currently uses MySQL (MyISAM) exclusively for all data storage. We currently have over 300 webservers and about 150 databases. Unfortunately I'm in a position to have to write a table structure to support over 100 million rows in a 30 day period. The idea is:

  1. High volume inserts (no updates or deletes and always at the end of the table)
  2. 1 row selects
  3. Data older than 30 days is thrown away

The best solution seems to be to have a table for each day combined into a Merge table for the selects. There will be duplicate data indeed, but the SELECT will only pull the most recent row based on a timestamp and an int field. Obviously having 30 tables isn't ideal but so goes life.

Are there any inherent flaws with this approach? Are there any other ways to approach this that I am missing (we are stuck at 5.0)? Will table locking be a huge issue when performing the ALTER TABLE on the merge table when the new day's table is created? We currently have a table rotation structure in place but if we go with a single table having to select the data we want out of the old table into the new one would be quite slow as it approaches 100 million rows.

There are other technologies out there to accomplish this in an elegant manner, but our sales team already sold the solution and we don't have the luxury of time.

Any input would be appreciated.

Structure:

CREATE TABLE `merge_test_1` (
   `date_stamp` long NOT NULL,
   `hash` char(32) NOT NULL,
   `p_id` mediumint(8) unsigned NOT NULL,
   `a_id` mediumint(8) unsigned NOT NULL,
   `b_id` mediumint(8) unsigned NOT NULL,
   PRIMARY KEY  (`hash`,`p_id`,`date_stamp`)
 ) ENGINE=MyISAM

Query example

SELECT b_id,a_id FROM merge_test WHERE hash='1' AND p_id=1
ORDER BY date_stamp DESC LIMIT 1
A: 

If I'm getting the core of this question that indexing will be fruitless because of the high-volume inserts, and, searching based off MAX(id) doesn't meet your criteria... "the SELECT will only pull the most recent row based on a timestamp and an int field."

Have you tested using a view for this purpose? Seems plausible for a win.

E.g.

CREATE TABLE lotsofdata (
id INT UNSIGNED AUTO_INCREMENT,
int_val INT UNSIGNED,
the_timestamp TIMESTAMP,
PRIMARY KEY(id));
--
CREATE VIEW FROM 
SELECT id,int_val,the_timestamp 
FROM lotsofdata
WHERE the_timestamp = MAX(the_timestamp)
AND MAX(int_val)
LIMIT 0,1;

I hope this helps. If you can supply the table structure and a query example, I'd like to help. I just need some more specifics.

randy melder
I should've stated that our DBA group severely limits what we can do and Views are not supported. Edited post with examples structure and query.
methodin