views:

236

answers:

4

This is probably a really stupid question, but is there going to be much benefit in indexing a boolean field in a database table?

Given a common situation, like "soft-delete" records which are flagged as inactive, and hence most queries include WHERE deleted = 0, would it help to have that field indexed on its own, or should it be combined with the other commonly-searched fields in a different index?

+2  A: 

No.

You index fields that are searched upon and have high selectivity/cardinality. A boolean field's cardinality is obliterated in nearly any table. If anything it will make your writes slower (by an oh so tiny amount).

Maybe you would make it the first field in the clustered index if every query took into account soft deletes?

Mark Canlas
+1  A: 

I think it would help, especially in covering indices.

How much/little is of course dependent on your data and queries.

You can have theories of all sorts about indices but final answers are given by the database engine in a database with real data. And often you are surprised by the answer (or maybe my theories are too bad ;)

Examine the query plan of your queries and determine if the queries can be improved, or if the indices can be improved. It's quite simple to alter indices and see what difference it makes

Brimstedt
+1: I don't see the harm in indexing the column
OMG Ponies
+1  A: 

I think it would help if you were using a view (where deleted = 0) and you are regularly querying from this view.

astander
A: 

i think if your boolean field is such that you would be referring to them in many cases, it would make sense to have a separate table, example DeletedPages, or SpecialPages, which will have many boolean type fields, like is_deleted, is_hidden, is_really_deleted, requires_higher_user etc, and then you would take joins to get them.

Typically the size of this table would be smaller and you would get some advantage by taking joins, especially as far as code readability and maintainability is concerned. And for this type of query:

select all pages where is_deleted = 1

It would be faster to have it implemented like this:

select all pages where pages 
inner join DeletedPages on page.id=deleted_pages.page_id

I think i read it somewhere about mysql databases that you need a field to at least have cardinality of 3 to make indexing work on that field, but please confirm this.

It's hard to say given that a boolean is so thin and we don't have any data, but incurring a join and its workflow every single query would make queries slower, not faster, especially if the primary keys were clustered differently and if the deleted_pages table was needed for every single query.
Mark Canlas