views:

161

answers:

6

I need to look up all households with orders. I don't care about the data of the order at all, just that it exists. (Using SQL Server)

Is it more efficient to say something like this:

SELECT HouseholdID, LastName, FirstName, Phone 
FROM Households 
INNER JOIN Orders ON Orders.HouseholdID = Households.HouseholdID

or this:

SELECT HouseholdID, LastName, FirstName, Phone 
FROM Households 
WHERE EXISTS 
    (SELECT HouseholdID 
     FROM Orders 
     WHERE Orders.HouseholdID = Households.HouseholdID)
+2  A: 

depends on the database engine and how efficient it is at optimizing queries. A good mature database optimizer will make EXISTS faster, others will not. I know that SQL Server can make the query faster, I'm not sure of others.

KM
A: 

For such trivial query, it'll be no surprise if the execution of both variants will boil down to a single form, which will be deemed the most performant by the system. Check out the query execution plan to find out.

neutrino
+3  A: 

The 2 queries are not equivalent. The first one will return multiple results if there is multiple joining records. The EXISTS will likely be more efficient though especially if there is not a trusted FK constraint that the optimiser can use.

For further details on this last point see point 9 here http://www.simple-talk.com/sql/t-sql-programming/13-things-you-should-know-about-statistics-and-the-query-optimizer/

Martin Smith
the linked article is for SQL Server, this does not apply to all databases. Like I say in my answer this depends on your database, but SQL Server will optimize EXISTS better than the join.
KM
+4  A: 

Unless this is a fairly rigid 1:1 relationship (which doesn't seem to make much sense given a the wider meaning of households and orders), your queries will return different results (if there are more matching rows in the Orders table).

On Oracle (and most DBMS), I would expect the Exists version to run significantly faster since it only needs to find one row in Orders for the Households record to qualify.

Regardless of the DBMS I would expect the explain plan to show the difference (if the tables are significantly large that the query would not be resolved by full table scans).

Have you tried testing it? Allowing for caching?

C.

symcbean
A: 

In postgres, exists would be faster than inner join.

geff_chang
+1  A: 

As was said earlier, your queries will return different resultsets if at least one house has more than one order.

You could work around this by using DISTINCT, but EXISTS (or IN) is more efficient.

See this article:

Quassnoi