I accumulated a quite a lot of data in a raw form (csv and binary) - 4GB per day for a few months to be precise.
I decided to join the civilized world and use database to access the data and I wondered what would be the correct layout; the format is quite simple: a few rows for every time tick (bid, ask, timestamp, etc.) x up to 0.5Million/day x hundreds of financial instruments x monthes of data.
There is a MySQL server with MYISAM (which I understood would be the correct engine for this type of usage) running on commodity harware (2 x 1GB RAID 0 SATA, core 2 @ 2.7GHz)
What would be correct layout of the database? How should the tables/indices look like? What are the general recommendations with this scenario? What would you predict set me pitfalls along the way?
Edit: my common usage will be simple queries to extract time series information for a specific date and instruments, e.g.
SELECT (ask + bid) / 2
WHERE instrument='GOOG'
AND date = '01-06-2008'
ORDER BY timeStamp;
Edit: I tried to stuff all my data in one table indexed by the timeStamp but it was way too slow - therefore I reckoned it would take a more elaborate scheme.