tags:

views:

17

answers:

1

I have a map from strings to integers. To store this map in a MySQL database I created the following table:

CREATE TABLE map(
  Argument TEXT NOT NULL,
  Image INTEGER NOT NULL
)

I chose the TEXT type for the argument because its length is unpredictable, currently the longest record has 2290 chars and the average length is 88 chars.

After I'd met the performance troubles I tried to add index on Argument column, but found that I must to specify length, so to avoid this limitation I added a new integer column containing hash values (md5 or else) of Argument column values.

ALTER TABLE map ADD COLUMN ArgumentHash INTEGER;

And combined index

CREATE INDEX argument_index USING HASH ON map(ArgumentHash, Argument(80));

Since that time the problems with performance has disappeared. I'd like to ask whether it is a correct way to solve this problem.

A: 

I don't think there is a "correct" way, it depends what you are using the column for.

In my experience, it is unusual to have to/want to select on a large text column; the text is usually data retrieved by some other key (unless indexed in some other way - egs. full text, Lucene - but that doesn't appear to be what you are doing)

If you do in fact need an exact match on a large field, then it may be more efficient to use the hash as it will likely let you keep the index smaller. My guess is that if you need to use an index size larger than the size of the hash (depends on how close to the start of the TEXT the values generally differ), use the hash.

Your best bet is to try it and see. Profile both approaches with representative data and find out.

Brenton Alker