views:

197

answers:

4

Hi,

I have a nested SQL query that is exhibiting results which I can't understand. The query joins the PARTNER and USER tables via the PARTNER_USER table. A partner is basically a collection of users, and the objective of this query is to figure out when the 20th user registered with the partner that has ID 34:

select p.partner_id id,
       u.created_on launch_date
from   user u join partner_user pu
using (user_id) join partner p
using (partner_id)
where  p.partner_id = 34
and    u.user_id     =
       (select  nu.user_id
       from     user nu
       join     partner_user npu using (user_id)
       join     partner np using (partner_id)
       where    np.partner_id = 34
       order by nu.created_on limit 19, 1)

However, if I change the 2nd last line to

       where    np.partner_id = p.partner_id

The query fails with the error message "Subquery returns more than 1 row". Why does the first query work, but not the second? They look equivalent to me.

Thanks, Don

+1  A: 

Whe you use the = operator to compare with the results of a subquery, your subquery may return only a single row. if you want to check for all rows that are returned by the subquery, you have to use the IN operator.

AND u.User_Id IN ( SELECT .... )
Frederik Gheysels
Eh, is there something that is not correct in my answer ?
Frederik Gheysels
There's nothing incorrect per se about your answer, it just doesn't answer the question that was asked. Your answer just leads me to phrase my original question in a different way...."why does the subquery return more than one row in the second case, but not the first?
Don
Probably because the query optimizer views "LIMIT ..." as something to be evaluated after it's unrolled and optimized the relationships. It's too indirect for it to recognize that your expression can only necessarily result in one row, so it wants the expression as if the set count is unknown.
le dorfier
That really is an awkward form of query for the purpose. Try not to put stuff like LIMIT and ORDER BY into subqueries.
le dorfier
A: 

When you change the where clause in the sub query you're letting the floodgates open. The where clause in the main query doesn't restrict the subquery. So you're getting more than 1 result.

EDIT: What database is this? I've not come across the "using" construct before...

Jason Punyon
+3  A: 

JPunyon is right. One or the other query has to run first, and then have its results trimmed after the fact.

If you look at the queries as written, the outer query has to know the result of the inner query to apply its where clause. However, when you specify

where    np.partner_id = p.partner_id

in the inner query, then you're trying to make the inner query know the result of the outer query to apply its where clause as well. That's a circular dependency.

As a human, you can read the query and you can tell that in this particular case, you're asking for one particular value in the where clause in the outer query and you're asking to use that same value in the inner query, so it seems as though the database should see that and use the same literal value from the outer query.

In reality, the inner query is simply run first without knowing the possible values of p.partner_id, hence the "multiple rows" error.

Adam Bellaire
That's a great explanation. So what exactly is the difference between my query above and your query here: http://stackoverflow.com/questions/349933/sql-get-nth-item-in-each-group#350001
Don
I haven't got a clue. My best guess is that my other answer is simply wrong, and I overlooked it at the time.
Adam Bellaire
I tried to find my test run of that query, and I couldn't. So if I didn't test it, very likely it never worked to begin with. Sorry to be unhelpful. :(
Adam Bellaire
A: 

@Jason Punyon mysql supports the USING construct.

uzrbin