views:

380

answers:

5

I have a table that has a foreign key to a table that stores some blob data. When I do an inner join on the tables with a condition on the main table the join type goes from 'index' to 'ALL'. I would like to avoid this as my blob table is on the order of tens of gigabytes. How can I avoid it?

Here is the the basic inner join:

EXPLAIN SELECT m.id, b.id, b.data 
        FROM metadata m, blobstore b 
        WHERE m.fkBlob = b.id;

1, 'SIMPLE', 'm', 'index', 'fk_blob', 'fk_blob', '4', '', 1, 'Using index'
1, 'SIMPLE', 'b', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'blob_index.m.fkBlob', 1, ''

Here I add a condition on the main table:

EXPLAIN SELECT m.id, b.id, b.data 
        FROM metadata m, blobstore b 
        WHERE m.fkBlob = b.id AND m.start < '2009-01-01';
1, 'SIMPLE', 'b', 'ALL', 'PRIMARY', '', '', '', 1, ''
1, 'SIMPLE', 'm', 'ref', 'fk_blob,index_start', 'fk_blob', '4', 'blob_index.b.id', 1, 'Using where'

Notice that the order in which the tables are listed has changed. It is now doing a full table scan on the blob table because of a condition I've added regarding the main table.

Here is the schema:

 DROP TABLE IF EXISTS `blob_index`.`metadata`;
    CREATE TABLE  `blob_index`.`metadata` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `fkBlob` int(10) unsigned NOT NULL,
      `start` datetime NOT NULL,
      PRIMARY KEY (`id`),
      KEY `fk_blob` (`fkBlob`),
      KEY `index_start` (`start`),
      CONSTRAINT `fk_blob` FOREIGN KEY (`fkBlob`) REFERENCES `blobstore` (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;


    DROP TABLE IF EXISTS `blob_index`.`blobstore`;
    CREATE TABLE  `blob_index`.`blobstore` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `data` mediumblob NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
+3  A: 

I guess you are trying this on empty table (because MySQL thinks it needs to go through one row to do full table scan), what might influence results of scheduler. When you will do it on real table, the EXPLAIN results might vary (and actually did vary in my test).

Michal Čihař
Your right, I threw some data in there, ran analyze table on both tables, and now the first join is of type range on the meta table.
A: 

If I read what you posted right it goes from index to ref and eq_ref to all.

CREATE INDEX idx_metadata USING BTREE ON `metadata` (fkBlob,start);

Should take it right back.

Don
+2  A: 

The optimizer thinks you query will benefit from swapping the table order (which most probably means that statistics are not up to date).

You can try adding an index on metadata (start, fkBlob):

CREATE INDEX ix_metadata_start_blob ON metadata (start, fkBlob)

and run ANALYZE TABLE on both tables.

This way, the index on start will be used for filtering on metadata which will be made leading.

You can also explicitly force the order of the join:

SELECT  *
FROM    metadata m
STRAIGHT_JOIN
        blobstore b
ON      b.id = m.fkBlob
WHERE   m.start <= '2009-01-01'

, though it's usually not recommended.

Quassnoi
A: 
if the index doesnot take it right use HINTS

select /* INDEX <index_name> */
blah blah blah
from ........
Venkataramesh Kommoju
A: 

In the first example, MySQL used the metadata fk_blob index because it was a covering index--every column you used in the query was present in the index. (This is what "using index" means.) That query still did a full scan, but it scanned every row via the secondary index instead of the primary. As soon as you used start, you lost the covering index and MySQL calculated it was faster to use blobstore as the driving index. (InnoDB's primary index is integrated with the row storage.)

If you want MySQL to continue to use a metadata index as the driving index, make sure there's a single index on it that will be useful for the query. An index on (start, fkBlob) would be best for the second query, but that may not be useful for other queries. The next best index is to replace (fkBlob) with (fkBlob, start). You'll have to balance having too many indexes (which are expensive to maintain) vs having efficient query plans. Test, test, test--and never blindly believe explain on your dev database.

Ken Fox