views:

147

answers:

2

Hi, i have a very large table which is currently approx 70M rows and growing daily by the thousands , this schema is tipping over every day now so i'm moving to a partitioned table and redesigning the ddl .

the table is basicly a collection of NOT NULL INTEGERS(some medium some INT some tiny) which need to have a unique constraint for a set of 7 columns (the are more columns in the table) this is very expensive to compute per insert and increases the index file size very much further more since i never retrieve by it i would prefer to drop it and somehow md5/maybe simple concat the values... don't know yet .

the problem is that the only column type that can hold such a large unique number is a varchar i'm questioning whether or not this PK will actually be better ? allso since i will have a PRIMARY KEY 'part_key' (site_id,id) i will have to take the unique constraint in the designing of the partition , to summarize ... i'm sure this is not a new problem but i wasn't able to find any benchmarks/documents comparing the two , does anyone have any experience with this problem ? the question is realy should the PK be the whole 8 fields (keep in mind this table will probably have more then 100M rows) when i'm not ever retrieving by the pk or just a hashed value of the unique fields P.S : retrieving is mainly done by two out of the 7 columns Disk size is not an issue thanks .

A: 

until mysql gets partition pruning, i suggest (gulp) denormalizing your tables to fake partitioning. do something like take the modulo 32 of your first value and make 32 tables.

update: apparently mysql 5.1.6 and later do support pruning (http://dev.mysql.com/doc/refman/5.1/en/partitioning-pruning.html) so my stronger advice is to upgrade, then allow mysql to handle the partitioning for you, possibly using a hash value of one of your 7 columns.

longneck
A: 

If you can find a good hash that matches your record lookup, then applying your unique constraint on each partition shouldn't be that big of a deal. Smaller partition sizes will make your unique constraint less expensive. (If I'm wrong, someone here will school me I'm sure).

I'm stuck on MySQL 5.0. I'm facing manual partitioning a few tables over 40M rows. I have a document id that I can hash in my application: floor(docID/10)%100. This can give me 100 partitions and that should keep my index size down significantly. I did a query on the table and counted up the number of rows by hash:

select count(docID), floor(docID/10)%100 as partno
from documents 
group by partno

Luckily, I found a very even distribution on my first try. Your own formula will be different, I have no idea what your distribution would be like. Are you concerned that your unique constraint will not hold up in the face of partitioning?

If you can take advantage of MySQL partitioning, it will be more powerful and less of an impact on your application.

memnoch_proxy