tags:

views:

738

answers:

6

Example Query:

select * 
from A join B on A.ID = B.SOMEVALUE
where A.VALUE="something" and
B.ID = 
       (select ID from B where SOMEVALUE = A.ID and 
              THISDATE = (select max(SOMEDATE) from B where ...))

so, if you can read SQL you should see that I am doing a couple correlated subqueries to narrow down the results of the join . (and yes, this is horribly over-simplified).

In certain cases the subquery:

select ID from B where SOMEVALUE = A.ID and 
    THISDATE = (select max(SOMEDATE) from B where ...)

can return more than 1 value, which causes an error

"Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression."

which I fully expect. This is obviously not a good thing and I have code in place to (hopefully) prevent these duplicates from getting into the database in the first place (ie table B should only have 1 row that matches the

SOMEVALUE = A.ID and max(SOMEDATE)

criteria), however end-users are nothing if not creative in finding ways I can't think of to break software.

So now to my question:

Would it be better to change the first subquery to

select top 1 * from B ...

to prevent the user from seeing an error when/if (hopefully never) this situation arises or let the error come through. I'm leaning to not adding the top statement and letting the error come through rather then let the user see potentially incorrect data. I'm wondering if anyone has any thoughts on Best Practices in a situation like this...

A: 

Why not use LIMIT 1 at the end of your sub-select?

inkedmn
That's exactly the same thing as adding TOP 1. It's just a different flavor of sql (Mysql, etc.)
Michael Haren
Agreed with Michael, it's a different flavour so LIMIT 1 probably wouldn't actually work in this syntax.
Kezzer
Doesn't answer the question, but it does clarify what the question was suggesting to those of us unfamiliar with the TOP 1 syntax.
Jon Ericson
+4  A: 

Normally TOP 1 is a good idea.

Consider a large table with millions of rows with no index on the column you are matching, however you are only looking for a single row.

SELECT TOP 1 will mean the table scan stops as soon as the one item is found.

Without the TOP 1, the table scan will continue right through to the end.

As with anything that involves scanning (or brute force) to do the search. Using TOP 1, it should on average be 50% quicker than not using TOP 1.

However, Depending on what you need to return back, A real performance gain can normally be made by using EXISTS.

Instead of writing

SELECT * FROM table t
WHERE t.id = (SELECT TOP 1 foreignid from table2)

You can use

SELECT * FROM table t
WHERE EXISTS (SELECT 1 from table2 WHERE foreignid = t.id)
John
A: 

If you're looking to return just a single row you could do one of two things. The first is to change your equality check to instead check containment like so

select ID from B where SOMEVALUE = A.ID and 
THISDATE IN (select max(SOMEDATE) from B where ...)

(Notice the IN)

Secondly you could do

select TOP 1 ID from B where SOMEVALUE = A.ID and 
THISDATE IN (select max(SOMEDATE) from B where ...)

if you're only looking for one value only.

Or as you said, you could change the sub-query to SELECT TOP 1 which is okay, in my opinion, because as long as the WHERE clause isn't dependant on the outer-query it will probably execute the nested query only once and rely on a statically stored value from there on out like so

select ID from B where SOMEVALUE = A.ID and 
THISDATE = (select TOP 1 * from B where ...)

So there's a few options, but I'm not entirely sure of their efficiency.

Kezzer
I can't use in because I want the subquery to return only one row, in the scenario I'm trying to describe, it would return multiple rows
pfunk
+4  A: 

Why are you joining table A and B... then selecting from B in the sub query... and comparing to a column in A???

Wouldn't this be equivalent:

select * 
from A join B on A.ID = B.SOMEVALUE
where A.VALUE="something" and
THISDATE = (select max(SOMEDATE) from B where ...))

Also, if you are expecting to get one row total from this entire query... wouldn't this work:

select top 1 * 
from A join B on A.ID = B.SOMEVALUE
where A.VALUE="something" 
Order by B.SOMEDATE DESC
bobwienholt
as I said, the example is horribly over-simplified, you did make me stop and test that idea, but changing the (real) query as you suggest doesn't yield an equivalent result
pfunk
Why don't you show me the real query?
bobwienholt
the second query wouldn't work because from the outer-most query I do want more than one row. But after letting it simmer for a bit I believe I may have jumped the gun and the first query might work. The only difference is that I would get multiple rows where with my query I would generate an error
pfunk
On both examples you could include the "A.VALUE='something'" as part of the ON joining condition, you'd save some milliseconds here by filtering early rather than doing the join first, then filtering
Joe Pineda
A: 

I'd recommend the TOP 1 approach. It will probably help performance (not likely to hurt it).

The notion that without it you'll catch errors is honorable but a little misplaced. If an error occurs here months from now, it will NOT be intuitive at all why it occurred or what's going on. Instead, I'd focus on enforcing data integrity elsewhere.

Michael Haren
+2  A: 

Coding Practice: Placing top1 in a subquery which is required to return a single value.

Pros:

  • Allows execution to continue.
  • Allows the query to ignore insigificant Extra Values.
  • Stops looking for new values when the first Value has been found (performant).

Cons:

  • Prevents the query from complaining about significant Extra Values.
  • If order is not specified in the query, there is no control over which element top1 will select. This can cause a different result when the query is executed a second time.
David B