views:

134

answers:

6
+1  Q: 

simple sql query

which one is faster

select * from parents p
inner join children c on p.id = c.pid
where p.x = 2

OR

select * from 
(select * from parents where p.x = 2)
p
inner join children c on p.id = c.pid
where p.x = 2
+6  A: 

In MySQL, the first one is faster:

SELECT  *
FROM    parents p
INNER JOIN
        children c
ON      c.pid = p.id
WHERE   p.x = 2

, since using an inline view implies generating and passing the records twice.

In other engines, they are usually optimized to use one execution plan.

MySQL is not very good in parallelizing and pipelining the result streams.

Like this query:

SELECT  *
FROM    mytable
LIMIT 1

is instant, while this one (which is semantically identical):

SELECT  *
FROM    (
        SELECT  *
        FROM    mytable
        )
LIMIT 1

will first select all values from mytable, buffer them somewhere and then fetch the first record.

For Oracle, SQL Server and PostgreSQL, the queries above (and both of your queries) will most probably yield the same execution plans.

Quassnoi
+4  A: 

I know this is a simple case, but your first option is much more readable than the second one. As long as the two query plans are comparable I'd always opt for the more maintainable SQL code which your first example is for me.

jn29098
A: 

I'd think the first. I'm not sure if the optimizer would use any indexes on the the derived table in the second query, or if it would copy out all the rows that match into memory before joining back to the children.

KM
A: 

This is why you have DBAs. It depends entirely on the DBMS, and how your tables and indexes are configured, as to which one runs the fastest.

Database tuning is not a set-and-forget operation, it should be done regularly, as the data changes, to ensure your database runs at peak performance. The question is not really meaningful without specifying:

  • which DBMS you are asking about.
  • what indexes you have on the tables.
  • a host of other possible configuration items (which may also depend on the DBMS, such as clustering).

You should run both those queries through the query optimizer to see which one is fastest, then start using that one. That's assuming the difference in noticeable in the first place. If the difference is minimal, go for the easiest to read/maintain.

paxdiablo
A: 

For me, in the second query you are saying, I don't trust the optimizer to optimize this query so I'll provide some 'hints'.

I'd say, trust the optimizer until it let's you down and only then consider trying to do the optimizer's job for it.

onedaywhen
+2  A: 

It depends on how good the database is at optimising the query.

If the database manages to optimise the second one into the first one, they are equally fast, otherwise the first one is faster.

The first one gives more freedom for the database to optimise the query. The second one suggests a specific order of doing things. Either the database is able to see past this and optimise it into a single query, or it will run the query as two separate queries with the subquery as an intermediate result.

A database like SQL Server keeps statistics on what the database tables contain, which it uses to determine how to execute the query in the most efficient way. For example, depending on what will elliminate most records it can either start with joining the tables or filtering the parents table on the condition. If you write a query that forces a specific order, that might not be the most efficient order.

Guffa