This is a follow up to my question "Efficiently storing 7.300.000.000 rows" (http://stackoverflow.com/questions/665614/efficiently-storing-7-300-000-000-rows).
I've decided to use MySQL with partitioning and the preliminary schema looks like this:
CREATE TABLE entity_values (
entity_id MEDIUMINT UNSIGNED DEFAULT 0 NOT NULL, # 3 bytes = [0 .. 16.777.215]
date_id SMALLINT UNSIGNED DEFAULT 0 NOT NULL, # 2 bytes = [0 .. 65.535]
value_1 MEDIUMINT UNSIGNED DEFAULT 0 NOT NULL, # 3 bytes = [0 .. 16.777.215]
value_2 MEDIUMINT UNSIGNED DEFAULT 0 NOT NULL, # 3 bytes = [0 .. 16.777.215]
UNIQUE KEY (entity_id, date_id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY HASH(entity_id) PARTITIONS 25;
This gives:
- Rows = 7.300.000.000 rows (as per the requirements stated in the previous post)
- Size/row = 11 bytes (3+2+3+3)
- Total size = 7.300.000.000 rows * 11 bytes = 80.300.000.000 bytes = 80.3 GB
- Partitions = 25 (3.2 GB / partition, the partion size is somewhat arbitrary)
Please note that I've dropped the primary key from the original design since the "id" column won't be used.
Now to my question - given the requirements outlined in my previous post and the schema above, do you have any suggestions on further optimizations/tweaks that can be made? Or is the above schema "optimal" given that I've decided to use MySQL?
Update: I tried loading the current data set into the schema above and the 8.570.532 rows took 212.000.000 bytes worth of disk space, which gives roughly 24.7 bytes per row.
Update: Please note that the index covering entity_id+date_id will be used also for queries only targeting entity_id.