views:

72

answers:

3

I have a MySQL(innodb) table 'items' with the following characteristics

  1. Large number of rows, and keeps on increasing.
  2. Large number of columns of various data-types including 'text';
  3. primary key 'item_id' is present.

There are additional requirements as follows:

  1. Need to query items based on their status
  2. Need to update status

The above two operations happen quite frequently.

Given the above scenario I have two questions

  1. Would making a separate table with two columns namely item_id and status with item_id as primary key provide increased performance?
  2. If the above is true, how am I going to tackle querying item_ids based on status?

I am inexperienced in handling databases. I hope you will bear with me :)

+1  A: 

Would make more sense to create an index on your status and your item_id if its the only columns you need to fetch.

create index status_item_id_items on items (status)

You can then query your result that will use this index:

select item_id, status from items where status = 'status'

Keep in mind that if you don't have many different statuses your query may ends up returning a lot of row and could be slow. If you can be constrained by a more 'selective' column like a datetime it would be better.

Vincent
Thanks Vincent. But if I index both item_id and status, wouldn't updating the status values be slower as the table increases in size ?Or wouldn't it matter much since it has just two columns?
nano
For updating one row at a time, which is probably your use case, it should be negligible. This is true that you have one more index to update but creating a new table would cost much more.
Vincent
Thanks. Really appreciate your help.
nano
A: 

Answering part 2 first, you'd do an inner join of your two tables:

SELECT i.*, s.StatusCode FROM items AS i INNER JOIN status AS s ON s.item_id = i.item_id

To answer part 1, though, I don't think doing this would gain you any performance advantage.

peejaybee
+3  A: 

This is called vertical segmentation. It is often used when a data entity has multiple access patterns which access different subsets of the entities attributes (table columns), with different frequencies. If one function needs access to only one or two columns 100s of times per second, and another application function needs access to all the other columns, but only once or twice a day, then this approach is warrented, and will garner substantial perfomance improvement.

Basically, as you suggested, you "split" the table into two tables, both with the same key, with a one-to-one FK/PK->PK relationship. In one table you put only those few columns that are accessed more frequently, and you put the rest of the columns in the other table that will be accessed less frequently. You can then apply indexing to each table more appropriately based on the actual access pattern for each table separately.

Charles Bretana
Thanks for the answer.
nano