I have a MySQL InnoDB table with a status column. The status can be 'done' or 'processing'. As the table grows, at most .1% of the status values will be 'processing,' whereas the other 99.9% of the values will be 'done.' This seems like a great candidate for an index due to the high selectivity for 'processing' (though not for 'done'). Is it possible to create an index for the status column that only indexes the value 'processing'? I do not want the index to waste an enormous amount of space indexing 'done.'
Better solution: don't use strings to indicate statuses. Instead use constants in your code with descriptive names => integer values. Then that integer is stored in the database, and MySQL will work a LOT faster than with strings.
I don't know what language you use, but for example in PHP:
class Member
{
const STATUS_ACTIVE = 1;
const STATUS_BANNED = 2;
}
if ($member->getStatus() == Member::STATUS_ACTIVE)
{
}
instead of what you have now:
if ($member->getStatus() == 'active')
{
}
I'm not aware of any standard way to do this but we have solved a similar problem before by using two tables, Processing
and Done
in your case, the former with an index, the latter without.
Assuming that rows don't ever switch back from done
to processing
, here's the steps you can use:
- When you create a record, insert it into the
Processing
table with the column set toprocessing
. - When it's finished, set the column to
done
. - Periodically sweep the
Processing
table, movingdone
rows to theDone
table.
That last one can be tricky. You can do the insert/delete in a transaction to ensure it transfers properly or you could use a unique ID to detect if it's already transferred and then just delete it from Processing
(I have no experience with MySQL transaction support which is why I'm also giving that option).
That way, you're only indexing a few of the 99.9% of done
rows, the ones that have yet to be transferred to the Done
table. It will also work with multiple states of processing
as you have alluded to in comments (entries are only transferred when they hit the done
state, all other states stay in the Processing
table).
It's akin to having historical data (stuff that will never change again) transferred to a separate table for efficiency. It can complicate some queries where you need access to both done
and non-done
rows since you have to join two tables so be aware there's a trade-off.