I am trying to take one step towards optimizing a 90GB+ table:
Old Table
Every day the table grabs approx. 750,000 records from an external source and adds them to the table with the new date. This has been going on for three years from what I understand. 97% of the records don't change from one day to the next.
New Table
I am trying to go through old table (millions and millions of records) and eliminate redundancy which will likely reduce the table size quite dramatically.
old_table
- date
- record_id
- data_field (really many fields, but for the sake of the example)
new_table_index
- date
- index_id
new_table
- index_id
- record_id
- data_field
Logic as we go through each record in old_table
if (record_id is not in new_table) or (record_id is in new_table, but the latest entry of it has a different data_field)
insert it into the new_table and get the index_id
else
get the latest entry index_id for that record_id from the new_table_index
always
insert the index_id and date into the new_table_index
Any thoughts on optimal ways to do this? I am not advanced enough with MySQL to put this all together. When I tried writing a script in PHP it used up 3GB of memory and then failed. Other suggestions or queries??? Thanks so much!