views:

115

answers:

3

is there a better way to write this SQL than using WHERE ... IN (subquery)?

  SELECT device.mac, reseller.name, agent.name
  FROM device
  LEFT JOIN global_user
    ON device.global_user_id = global_user.id
  LEFT JOIN agent
    ON global_user.id = agent.global_user_id
  LEFT JOIN reseller
    ON global_user.id = reseller.global_user_id
        OR agent.reseller_id = reseller.id
  WHERE device.global_user_id IN (
    SELECT global_user_id
        FROM reseller
        WHERE id = '200'
  ) OR device.global_user_id IN (
    SELECT global_user_id
        FROM agent
        WHERE reseller_id = '200'
  );

im trying to get a list of all of the devices, with some reseller/agent details, under a particular reseller. this would include devices assigned directly to the reseller and devices assigned to agents under the reseller. reseller.id is unique. it will be executed on postgresql database.

devices are assigned to both agents and resellers. agents are assigned to resellers.

this query works, but i haven't often used an OR in a JOIN and i usually try to avoid subqueries. this query concept will be used often, so i'd like to make sure i haven't overlooked something.

thank you for any feedback.

+3  A: 

You could give this a whirl:

 SELECT device.mac, reseller.name, agent.name
  FROM device
  JOIN
  (
      SELECT global_user_id
      FROM reseller
      WHERE id = '200'
      UNION
      SELECT global_user_id
      FROM agent
      WHERE reseller_id = '200'
  ) r ON device.global_user_id = r.global_user_id
  LEFT JOIN global_user
    ON device.global_user_id = global_user.id
  LEFT JOIN agent
    ON global_user.id = agent.global_user_id
  LEFT JOIN reseller
    ON global_user.id = reseller.global_user_id
        OR agent.reseller_id = reseller.id

Clarification: Always a good idea to try out different variations of a query to make sure you end up with the best performing query (although often, different variations result in the same execution plan being generated by the query optimiser). Speaking from SQL Server point of view, the order in which the query is processed in means the JOINs are processed first before the WHERE clause. So in theory, this JOIN approach should scale the resultset down earlier.

AdaTheDev
For clarification, why would you have him use this instead of the original query?
NickLarsen
@NickLarsen - clarification added
AdaTheDev
@AdaTheDev You make a great point about the join, for a little more reading to anyone who visits this thread, check out this article on order of operations for sql server queries, http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm
NickLarsen
@AdaTheDev this query does produce the correct result set. thank you for your clarification and comments on order of operations. one of the other answers eliminates the use of the subqueries altogether: http://stackoverflow.com/questions/2210204/is-there-a-better-way-to-write-this-sql-than-using-where-in-subquery/2210314#2210314
dan2342
@AdaTheDev against a larger data set, my original query was taking about 2.5 seconds to return. this query seems to average about a second quicker. the solution i linked above, that also returns a correct reseult, takes about 40 seconds to run on the same larger dataset.
dan2342
@dan2342 - good to hear
AdaTheDev
+1  A: 

How about this?

SELECT d.mac, r.name, a.name
FROM device as d, global_user as g, agent as a, reseller as r
WHERE d.global_user_id = g.id
  AND g.id = a.global_user_id
  AND (g.id = r.global_user_id OR a.reseller_id = r.id)
  AND (r.id = '200' OR a.reseller_id = '200');
Wade
Because reseller.id is unique from comments original poster comments. I would still go with the ON clauses, but the point was you can remove the subqueries all together.
NickLarsen
Certain database flavors would be able to optimize that, but it would probably destroy performance on most servers. That said, it is succinct.
eswald
@Wade this won't show devices that are assigned directly to the reseller. this will only show the devices assigned to an agent. this will produce an incorrect result set.
dan2342
+1  A: 

I try to avoid sub-queries and IN clause if it is easy to replace them. If I understood you DB model correctly, this query should produce the same result:

SELECT      DISTINCT
            device.mac, reseller.name, agent.name
FROM        device
LEFT JOIN   global_user
        ON  device.global_user_id = global_user.id
LEFT JOIN   agent
        ON  global_user.id = agent.global_user_id
LEFT JOIN   reseller
        ON  global_user.id = reseller.global_user_id
        OR  agent.reseller_id = reseller.id
WHERE       reseller.id = '200'
        OR  agent.reseller_id = '200'
van
@van before i started using the subqueries, i almost had your suggestion in use, but i was slightly off. this does produce the correct result set, without the use of subqueries.
dan2342
@van when running this query against a larger data set, it takes 40 seconds to return a result as opposed to my original query that takes 2.5 seconds.
dan2342
@dan2342: very well maybe that this query is not optimal. Ideally you could split it into two queries navigating 2 of the possible paths from `device` to `reseller`. Also, if the query with sub-select first makes such a great performance, I can assume that you might be missing some `indices` on the database which might be key for the performance of your query.
van