tags:

views:

55

answers:

4

hey all I have tables with millions of rows in them and some of the select queries key off of 3 fields

company, user, articleid

would it be faster to create a composite index of those three fields as a key

or MD5 (company, user, articleid) together and then index the hash that's created.

?

thanks

A: 

A composite index seems to be the way to go, in particular since some of the individual keys appear to be fairly selective. The only situation which may cause you to possibly avoid the composite index approach is if the length of the composite key is very long (say in excess of 64 characters, on average).

While a MD5-based index would be smaller and hence possibly slightly faster, it would let you deal with the task of filtering the false positives out of the list of records with a given MD5 value.

When building a composite index, a question arise of the order in which the keys should be listed in the index. While this speak, somewhat, to the potential efficiency of the index, the question of the ordering has more significant impact on the potential usability of the index in cases when only two (or even one...) of the keys are used in the query. One typically tries and put the most selective column(s) first, unless this (these) selective column(s) is (are) the ones most likely to not be used when a complete set of these columns is not found in the query.

mjv
A: 

You would have to benchmark to be sure, but I believe that you will find that there isn't going to be a significant performance difference between a composite index of three fields and a single index of a hash of those fields.

In my opinion, creating data that wouldn't otherwise exist and is only going to be used for indexing is a bad idea (except in the case of de-normalization for performance reasons, but you'd need a conclusive case to do it here). For a 32 byte field of md5 data (minus any field overhead), consider that for every one million rows you have, you have created approximately an extra 30 MB of data. Even if the index was a teensy tiny bit faster, you've just upped the disk and memory requirements for that table. Your index seek time might be offset by disk seek time. Add in the fact that you have to have application logic to support this field, and I would opine that it's not worth it.

Again, the only true way to know would be to benchmark it, but I don't think you'll find much of a difference.

zombat
A: 

for performance, you might see advantages with the composite index. if you are selecting only the fields in the index, this is a "covering index" situation. that means the data engine will not have to read the actual data page from the disk, just reading the index is enough to return the data requested by your application. this can be a big performance boost. if you store a hash, you eliminate the possibility of taking advantage of a covering index (unless you are selecting only the hash in your sql).

best regards, don

Don Dickinson
oh, i should add that this might be database specific. i know mysql can take advantage of covering indices and i *think* mssql can. i have no idea about other engines.
Don Dickinson
A: 

One more consideration in favor of a composite key : having composite key on (company, user, articleid) means that it can be used when you search a record by company, or company+user, or by company+user+articleid. So you virtually have 3 indexes.

a1ex07