tags:

views:

76

answers:

3

I've got a table in a MySQL database that has the following fields:

ID | GENDER | BIRTHYEAR | POSTCODE

Users can search the table using any of the fields in any combination (i.e., SELECT * FROM table WHERE GENDER = 'M' AND POSTCODE IN (1000, 2000); or SELECT * FROM table WHERE BIRTHYEAR = 1973;)

From the MySQL docs, it uses left indexing. So if I create an index on all 4 columns it won't use the index if the ID field isn't used. Do I need to create an index for every possible combination of field (ID; ID/GENDER; ID/BIRTHYEAR; etc.) or will creating one index for all fields be sufficient?

If it makes any difference, there are upwards of 3 million records in this table.

A: 

Use EXPLAIN.

(I'd say, use Postgres, too, lol).

It seems recent versions of MySQL can use several indexes in the same query, they call this Index Merge. In this case 1 index per column will be enough.

Gender is a special case, since selectivity is 50% you don't need an index on it, it would be counterproductive.

peufeu
Actually, our users aren't required to give their gender, so there are 3 options: M, F and O. :) Since it's only 3 options, would an index still not be necessary?
cabuki
Indexes don't help when you select more than about 5-15% of your rows. The threshold depends on many factors, but you get the idea : the index lookup has a cost.
peufeu
A: 

Creating indexes on single fields is useful but it would be really useful if your data was of varchar type and each record had a different value, since birthyear and postcode are numbers they are already well indexed.

You can index birthyear because it should be different for many of the records (but up to 120 birthyears in total at max I guess).

Gender in my opinion doesn't need an index.

You can find out what field combinations are most likely to give different results and index those, like: birthyear - postcode, id - birthyear, id - postcode.

Lex
+1  A: 

In this situation I typically log search criteria, number of results returned and time taken to perform the search. Just because you're creating the flexibility to search by any field doesn't mean your users make use of this flexibility. I'd normally create indexes on sensible combinations and then once I've determined the usage patterns drop the lowly used indexes or create new unsuspected indexes.

I'm not sure if MySQL supports statistics or histograms for skewed data but the index on gender may or may not work. If MySQL supports statistics then this will indicate the selectivity of an index. In a general population an index on a field with a 50/50 split won't help. If you're sample data is computer programmers and the data is 95% males then a search for females would use the index.

Dave Barker