ansaurus

Question

Why does MySQL not use an index on a int field that's being used as a boolean?

Answer 1

+2 A:

For MySQL to use the index, you have to explicitly compare the int field to a value (e.g. true, 1).

select * from myTable where myInt = true

Senseful 2008-12-24 15:15:15

Answer 2

+2 A:

I'm not a database expert, but doesn't it defeat the purpose of having an index on the field if there is only two possible values of the field?

If all of the fields in the indexed column are unique, then the database engine can do an index scan to find the relevant rows. If there are only two possible values - then I don't see the purpose of having that field indexed. The DB engine has to do the same operation that it would if the index did not exist.

Perhaps MySQL is not showing it as a possible key because the engine has discarded the idea of using the index in the execution plan?

matt b 2008-12-24 15:18:34

I clarified in the question that the index is not unique.

Senseful 2008-12-24 15:26:58

Indexes are not restricted to unique columns. Indexes are basically look up tables that sort/hash based on the columns in the index. The purpose of an index is to tell the RDMS to optimize searches for certain columns in a table. Indexes on bools may not do much for performance, but it is valid.

Bernard 2008-12-24 15:31:36

Right, I wasn't asking if it was valid or not, I was asking if it was useful. The field that is indexed will ever only be 0 or 1, so the index will contain two entries, each with N / 2 rows in it where N = size of table. Correct? So what's the point of indexing this field?

matt b 2008-12-24 15:51:50

Answer 3

A:

Your question's SQL looks malformed to me. Are you looking for non-null values of the column? This should use the index:

select * from myTable where myInt is not null

Kieveli 2008-12-24 15:34:35

MySQL evaluates any non-zero/null expression as true so for example the statement "SELECT * FROM myTable WHERE 1" is a valid statement and will return all records from the table.In myInt field, I am storing 0 or 1, not NULL and NOT NULL values.

Senseful 2008-12-24 15:41:45

@eagle: you're storing two different non-null values (0 and 1). And MySQL is allowing you to use a non-standard abbreviation; your SQL will not migrate to other DBMS easily if you use the shorthand.

Jonathan Leffler 2008-12-24 17:16:42

Answer 4

+1 A:

There are lots of factors to consider.

One factor that should not enter into it is the notation used in the question. When the column is a boolean, then these conditions should be treated by the optimizer as identical:

SELECT * FROM MyTable WHERE MyInt;

SELECT * FROM MyTable WHERE MyInt != 0;

SELECT * FROM MyTable WHERE MyInt IS TRUE;

SELECT * FROM MyTable WHERE MyInt = TRUE;

There may be other equivalent formulations. The first of these is not standard SQL (even if the type of MyInt is BOOLEAN; the others are standard. But the optimizer should simply transform the shorthand into the appropriate long form and then behave the same as if the long form was written by the user. (If the optimizer does not do this, then there is arguably a problem with the optimizer; the query should be reduced to a canonical form before deciding how to process the query. However, there are often blind spots in even the best optimizers. Learning how to avoid those is an art form, and inherently DBMS-specific.)

The optimizer uses an index when it believes the index will boost performance of the query. When the index won't boost performance, it is ignored (if the optimizer is any good). Sometimes, that depends on whether the statistics for the index are up to date.

In data warehousing systems, the system can be designed and configured to make sequential scans of the table very fast; in such systems, if the selectivity of an index is such that using it will pull more than as little as 25% of the rows, it can actually be quicker to do the full table scan than to use the index.

Think about it. When reading via an index, the DBMS has to do at least two reads; it reads the information about the row from the index page, and then it has to read the row from the data page.

Some DBMS provide index-only tables. All the data is in the index. Other DBMS provide a mechanism such that you can say "index is unique on columns A, B, C; however, include columns D and E in the data too". Then if the query requires data from A, B, C, D or E (or any combination) and there's no filtering on other columns, the DBMS only has to scan the index, not the table pages too.

Typically, you get many index rows to a page. However, for some tables, reading an index may require reading more data than reading the rows. Consider the archetypal many-to-many mapping table containing two (4-byte) integer ID values. That requires 8 bytes per row in the data pages, but the index probably requires 4-8 bytes of overhead (because the index key entry stores the two ID values plus the information needed to locate the corresponding row on disk). So, an index scan there may involve twice as much disk I/O as the data scan, even if the index scan is done 'index only'.

This is barely touching the surface of the possible reasons for using or not using an index.

Jonathan Leffler 2008-12-24 17:12:35

ansaurus

tags:

views:

answers:

Why does MySQL not use an index on a int field that's being used as a boolean?

related questions