ansaurus

Question

SQL (any) Request for insight on a query optimization

Answer 1

A:

YMMV, but I've often found using EXISTS instead of IN makes queries run faster.

SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)

Of course, without seeing the rest of the query and the context, this may not make the query any faster.

JOINing may be a more preferable option, but if a.id appears more than once in the id column of b, you would have to throw a DISTINCT in there, and you more than likely go backwards in terms of optimization.

dpmattingly 2009-05-14 21:21:52

What makes you think this will run faster?

Andomar 2009-05-14 21:28:54

Distinct would be required

jim 2009-05-14 21:50:38

Answer 2

A:

I would never use a subquery like this. A join would be much faster.

select a.*
from a 
join b on a.id = b.id

Of course don't use select * either (especially never use it when doing a join as at least one field is repeated) and it wastes network resources to send unnneeded data.

HLGEM 2009-05-14 21:22:13

What makes you think this will run faster?

Andomar 2009-05-14 21:28:49

Join would be faster, @ * use, example was simplified.

jim 2009-05-14 21:30:47

Execution plan is the same and this example presents a potential duplicate record depending on structure of b.

Jeff O 2009-05-14 21:40:55

yes, it is possible b can contain dup a refs

jim 2009-05-14 21:43:39

Answer 3

A:

Have you looked at the execution plan?

How about

select a.* 
from a 
inner join b
on a.id = b.id

presumably the id fields are primary keys?

Russ Cam 2009-05-14 21:23:33

id's are keys/indexed

jim 2009-05-14 21:28:57

Answer 4

+2 A:

Both queries you list are the equivalent of:

select a.* 
from a 
inner join b on b.id = a.id

Almost all optimizers will execute them in the same way.

You could post a real execution plan, and someone here might give you a way to speed it up. It helps if you specify what database server you are using.

Andomar 2009-05-14 21:27:56

mysql innodb, can't post execution plan sorry.

jim 2009-05-14 21:40:13

the title says sql (any) because its a theoretical question,

jim 2009-05-14 22:02:15

Lou indicated that b could have duplicate ID's, so a join would not produce the same results.

Jeff O 2009-05-15 20:54:11

@Guiness: You're right

Andomar 2009-05-15 21:15:06

Answer 5

A:

Select a.* from a
inner join (Select distinct id from b) c
on a.ID = c.AssetID

I tried all 3 versions and they ran about the same. The execution plan was the same (inner join, IN (with and without where clause in subquery), Exists)

Since you are not selecting any other fields from B, I prefer to use the Where IN(Select...) Anyone would look at the query and know what you are trying to do (Only show in a if in b.).

Jeff O 2009-05-14 21:32:52

My explain gives simple select types on everything for the join, and for the 'in select' it gives primarys on all of a tables then dependent sub query. Join seems to be the faster option even using distinct.

jim 2009-05-14 21:54:16

Answer 6

A:

your problem is most likely in the seven tables within "a"

make the FROM table contain the "a.id" make the next join: inner join b on a.id = b.id

then join in the other six tables.

you really need to show the entire query, list all indexes, and approximate row counts of each table if you want real help

KM 2009-05-14 21:50:49

No see, I am only asking about the difference between the two shown in my question. the 'a' part isn't broken indexes are correctly used. It isn't slow in the sense the query could be optimized more its slow because its a huge result set. Listing out db schema isn't important here, it is reduced down to the parts that I am interested in.

jim 2009-05-14 22:00:16

Answer 7

+2 A:

Your question was about the difference between these two:

select a.* from a where a.id in (select id from b where b.id = a.id)

select a.* from a where a.id in (select id from b)

The former is a correlated subquery. It may cause MySQL to execute the subquery for each row of a.

The latter is a non-correlated subquery. MySQL should be able to execute it once and cache the results for comparison against each row of a.

I would use the latter.

Bill Karwin 2009-05-14 22:21:04

Thanks. That explains some things.

jim 2009-05-15 03:57:10

ansaurus

tags:

views:

answers:

SQL (any) Request for insight on a query optimization

related questions