views:

373

answers:

4

I have a SQL query that takes a very long time to run on MySQL (it takes several minutes). The query is run against a table that has over 100 million rows, so I'm not surprised it's slow. In theory, though, it should be possible to speed it up as I really only want to get back the rows from the large table (let's call it A) that have a reference in another table, B.

So my query is:

SELECT id FROM A, B where A.ref = B.ref;

(A has over 100 million rows; B has just a few thousand).

I've added INDEXes:

alter table A add index(ref);
alter table B add index(ref);

But it's still very slow (several minutes -- I'd be happy with one minute).

Unfortunately, I'm stuck with MySQL 4.1.22, so I can't use views.

I'd rather not copy all of the relevant rows from A into a separate, smaller table, as the rows that I need will change from time to time. On the other hand, at the moment that's the only solution I can think of.

Any suggestions welcome!

EDIT: Here's the output of running EXPLAIN on my query:

+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+
| id | select_type | table                  | type | possible_keys                            | key                     | key_len | ref                                            | rows  | Extra       |
+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+
|  1 | SIMPLE      | B                      | ALL  | B_ref,ref                                | NULL                    |    NULL | NULL                                           | 16718 | Using where |
|  1 | SIMPLE      | A                      | ref  | A_REF,ref                                | A_ref                   |       4 | DATABASE.B.ref                                 |  5655 |             |
+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+

(In redacting my original query example, I chose to use "ref" as my column name, which happens to be the same as one of the types, but hopefully that's not too confusing...)

Thanks,

Ben

A: 

Use joins.

Bombe
does mysql really do different things depending on whether JOIN syntax is used or not?
David Schmitt
As far as I know, select * from A, B is the same as select * from a inner join B... Is there a particular kind of join I could use that would make my query faster? I've tried a few combinations and none of them seem to make any difference...
Ben
MySQL should behave the same whether JOIN is used explcitily or not.
MarkR
A: 

SELECT id FROM A JOIN B ON A.ref = B.ref

You may be able to optimize further by using an appropriate type of join e.g. LEFT JOIN

http://en.wikipedia.org/wiki/Join_(SQL)

Adam Pope
hes already doing an inner, just a different syntax
Shawn Simon
+2  A: 

The query optimizer is probably already doing the best that it can, but in the unlikely event that it's reading the giant table (A) first, you can explicitly tell it to read B first using the STRAIGHT_JOIN syntax:

SELECT STRAIGHT_JOIN id FROM B, A where B.ref = A.ref;
Adam Bellaire
Thanks: I'd never come across the STRAIGHT_JOIN syntax before. Unfortunately, it doesn't seem to speed things up. I think your first sentence is probably right -- it looks like there is no simple way to speed it up. I think my answer is probably going to be breaking it up into multiple queries...
Ben
+1  A: 

From the answers, it seems like you're doing the most efficient thing you can with the SQL. The A table seems to be the big problem, how about splitting it into three individual tables, kind of like a local version of sharding? Alternatively, is it worth denormalising the B table into the A table, assuming B doesn't have too many columns?

Finally, you could just have to buy a faster box to run it on - there's no substitute for horsepower!

Good luck.

endian