views:

82

answers:

4

I'm using Mysql 5.0 and am a bit new to indexes. Which of the following queries can be helped by indexing and which index should I create?

(Don't assume either table to have unique values. This isn't homework, its just some examples I made up to try and get my head around indexing.)

Query1:
Select a.*, b.*
From a
Left Join b on b.type=a.type;

Query2:
Select a.*, b.*
From a,b
Where a.type=b.type;

Query3:
Select a.*
From a
Where a.type in (Select b.type from b where b.brand=5);

Here is my guess for what indexes would be use for these different kinds of queries:

Query1:
Create Index Query1 Using Hash on b (type);

Query2:
Create Index Query2a Using Hash on a (type);
Create Index Query2b Using Hash on b (type);

Query3:
Create Index Query2a Using Hash on b (brand,type);

Am I correct that neither Query1 or Query3 would utilize any indexes on table a?

I believe these should all be hash because there is only = or !=, right?

Thanks

+1  A: 

query 3 is invalid, but i assume you meant

where a.type in ....

Query 1 is the same as query two, just better syntax, both probably have the same query plan and both will use both indexes.

Query 3 will use the index on b.brand, but not the type portion of it. It would also use an index on a.type if you had one.

You are right that they should be hash indexes.

Paul Creasey
I corrected my typo, thanks.
Dan
sorry? Query 1 is an outer join, it will output all values from 'a' anyway (while Query 2 will not)
Dmitry
+1  A: 

using the explain command in mysql will give a lot of great info on what mysql is doing and how a query can be optimized.

in q1 and q2: an index on (a.type, all other a cols) and one on (b.type, all other b cols) in q3: an index on (a.b_type, all other a cols) and one on b (brand, type)

ideally, you'd want all the columns that were selected stored directly in the index so that mysql doesn't have to jump from the index back to the table data to fetch the selected columns. however, that is not always manageable (i.e.: sometimes you need to select * and indexing all columns is too costly), in which case indexing just the search columns is fine.

so everything you said works great.

jspcal
Wow... yeah the mysql documentation didn't mention anything about indexing non-join columns. Thanks for the tip!
Dan
The better approach is to use a clustered index where you can, of course that isn't always possible.
Paul Creasey
yeah clustered is great. in mysql at least it can only be on a column that is unique. ms sql server has better support for it for sure.
jspcal
You mean the clustered index or covering index is great? The clustered index is on the primary key and is how the row data is stored. A covering index is any index that includes all the columns used in the query. You can definitely add non-unique columns to an index to make it a covering index--just be careful about the index size.
Ken Fox
+1  A: 

Query 3 could utilize an index on a.type if the number of b's with brand=5 is close to zero

Query2 will utilize indices if they are B-trees (and thus are sorted). Using hash indices with index-join may slow down your query (because you'll have to read Size(a) values in non-sequential way)

Dmitry
+1  A: 

Query optimization and indexing is a huge topic, so you'll definitely want to read about MySQL and the specific storage engines you're using. The "using hash" is supported by InnoDB and NDB; I don't think MyISAM supports it.

The joins you have will perform a full table or index scan even though the join condition is equality; Every row will have to be read because there's no where clause.

You'll probably be better off with a standard b-tree index, but measure it and investigate the query plan with "explain". MySQL InnoDB stores row data organized by primary key so you should also have a primary key on your tables, not just an index. It's best if you can use the primary key in your joins because otherwise MySQL retrieves the primary key from the index, then does another fetch to get the row. The nice exception to that rule is if your secondary index includes all the columns you need in the query. That's called a covering index and MySQL will not have to lookup the row at all.

Ken Fox
Just looked up the index types at http://dev.mysql.com/doc/refman/5.0/en/create-index.html. MyISAM doesn't support hash indexes. I was wrong about InnoDB--it doesn't support hash indexes either. It has an adaptive hashing system built from the b-tree index.
Ken Fox