views:

277

answers:

2

MySQL table with couple of fields:

  • id - PRIMARY KEY
  • url - CHAR(255).

url field is also unique and indexed. Currently have couple of hundreds of thousands entries in that table. MySQL gets really slow.

The idea is: if I add hash BIGINT UNIQUE INDEXED field, and create composite index hash/url - will it work faster? Means will MySQL first check my hash/url pair using hash and then url? Will it be faster?

If answer is yes - for what reason this is not transparently implemented for indexed strings in MySQL?

+2  A: 

The idea is: if I add hash BIGINT UNIQUE INDEXED field, and create composite index hash/url - will it work faster?

The key lookup on a BIGINT field is faster than on a CHAR field (surprisingly, performance gain is more noticeable on a key miss than on a key hit).

Note, however, that a BIGINT hash has very high probability of hash collisions, that's why I'd not recommend to use it in a UNIQUE field

Means will MySQL first check my hash/url pair using hash and then url?

If you create a composite key on (hash, url) and search for the hash only, it will use the ref condition on the hash part of the index.

Quassnoi
@Quassnoi: I've used this hash technique before. Would you suggest `md5()`, `sha1()`, or other?
memnoch_proxy
Cryptographic strength is not much of an issue here. `MD4` is fastest, `MD5` is built into `MySQL`, so I'd use the former for the hashing outside `MySQL` and the latter for the hashing inside `MySQL`.
Quassnoi
A: 

+1 Quassnoi

Doing a lookup on hash would definitely be much faster. I created a similar style table but stored very long URIs, and had to use a TEXT type, so a composite key didn't apply there. URLs and URIs are often much longer than a char field can be. I've stored them broken down into multiple tables, and even then, after a few million URLs, I had to consider partitioning the URL/URIs tables. Roughly:

table hostnames:
    id int(11), 
    hostname char(255), 
    unique ( hostname )
table uri:
   id int(11),
   digest char(33),
   uri text,
   unique digest
table querystr: 
   -- like uri table
table urls:
   id int(11),
   fk_host int(11),
   fk_uri int(11),
   fk_query int(11)
   unique u ( id, fk_host, fk_uri, fk_query )

If you need to search through a few million URLs or URIs in a free-text manner, using LIKE '%foo%' won't work well. Consider a more text-search oriented MySQL extension or a library like Lucene/Solr for heavy searching.

memnoch_proxy