views:

37

answers:

2

I have a MySQL InnoDB table with a status column. The status can be 'done' or 'processing'. As the table grows, at most .1% of the status values will be 'processing,' whereas the other 99.9% of the values will be 'done.' This seems like a great candidate for an index due to the high selectivity for 'processing' (though not for 'done'). Is it possible to create an index for the status column that only indexes the value 'processing'? I do not want the index to waste an enormous amount of space indexing 'done.'

A: 

Better solution: don't use strings to indicate statuses. Instead use constants in your code with descriptive names => integer values. Then that integer is stored in the database, and MySQL will work a LOT faster than with strings.

I don't know what language you use, but for example in PHP:

class Member
{
   const STATUS_ACTIVE = 1;
   const STATUS_BANNED = 2;
}

if ($member->getStatus() == Member::STATUS_ACTIVE)
{
}

instead of what you have now:

if ($member->getStatus() == 'active')
{
}
Coronatus
Thanks for the answer. The strings are in fact ENUMs, which means they are being mapped to integers. While your suggestion is a valid one, it does not get to the root of my question: Is it necessary to, and if so how do I, index only a specific value in a column?
BrainCore
+2  A: 

I'm not aware of any standard way to do this but we have solved a similar problem before by using two tables, Processing and Done in your case, the former with an index, the latter without.

Assuming that rows don't ever switch back from done to processing, here's the steps you can use:

  1. When you create a record, insert it into the Processing table with the column set to processing.
  2. When it's finished, set the column to done.
  3. Periodically sweep the Processing table, moving done rows to the Done table.

That last one can be tricky. You can do the insert/delete in a transaction to ensure it transfers properly or you could use a unique ID to detect if it's already transferred and then just delete it from Processing (I have no experience with MySQL transaction support which is why I'm also giving that option).

That way, you're only indexing a few of the 99.9% of done rows, the ones that have yet to be transferred to the Done table. It will also work with multiple states of processing as you have alluded to in comments (entries are only transferred when they hit the done state, all other states stay in the Processing table).

It's akin to having historical data (stuff that will never change again) transferred to a separate table for efficiency. It can complicate some queries where you need access to both done and non-done rows since you have to join two tables so be aware there's a trade-off.

paxdiablo