tags:

views:

434

answers:

9

Something I've noticed recently is people using IN where it seems kind of inappropriate. I'm feeling like I'm missing a trick - perhaps there are speed gains or something?

This query:

SELECT * FROM pages WHERE is_visible IN ($visibility)

Why use that? Why not:

SELECT * FROM pages WHERE is_visible = $visibility

Sure, if it were this:

SELECT * FROM pages WHERE is_visible 
IN ($visibility,$invisibility,$somethingelse)

Then WHERE x IN (1,2,3) is a better alternative to WHERE x = 1 OR x = 2 OR x = 3, right? But surely this is the only scenario where you should use IN?

Anyway thanks for your time - I look forward to your replies!

+8  A: 

Then WHERE x IN (1,2,3) is a better alternative to WHERE x = 1 OR x = 2 OR x = 3, right? But surely this is the only scenario where you should use IN?

No, you can use IN with a subquery as well.

... where field in (select field from another_table)

Otávio Décio
Thank you, Kip.
Otávio Décio
That's a good point. I didn't consider that IN could be used with a subquery
searbe
+5  A: 

Maybe $visibility is dynamically generated, like this:

$visibility = implode(', ', array_map('intval', array(1, 2, 3)));

Where array(1, 2, 3) may come from untrusted sources.

Ionuț G. Stan
+9  A: 

Maybe it is unknown wether $visibility refers to a single value or multiple values? Doesn't quite fit your example, but I have seen such use in other places.

Thomas Lötzer
I, too, have seen this... the memory still haunts me.
rmeador
Mmm a good point - in this case though, visiblity is always 0 or 1
searbe
If $visibility refers to multiple values, how would those values get expanded into the sql? If $visibility = '1,2,3' this doesn't make much sense: SELECT * FROM pages WHERE is_visible IN ('1,2,3')
Emtucifor
A: 

"WHERE x IN (1,2,3)" is the same as "WHERE x = 1 OR x = 2 OR x = 3" anyway.

gbn
+2  A: 

In acts on any kind of set operation, whereas = is on a single value. So you can use in for multiple records on another table, or any other kind of data structure that represents multiple values.

Yishai
+3  A: 

I think more to the point in understand what IN does.

If you do

SELECT * FROM pages WHERE is_visible = $visibility

In the above $visibility would have to be a var so is_visibile = 1, So your SQL collects all where is_visible = 1

Where

SELECT * FROM pages WHERE is_visible IN ($visibility)

The $visibility would be an array of data like @Ionut G. Stan has illustrated.

So your call would look like is_visible IN ('1', '2', '3')

So now your SQL will be collecting all 1,2 & 3 rows.

Hope that makes sense. Its a big difference.

Lee
I don't see how this answers his question. He seems well aware of what IN means.
Kip
@Lee: from your comment that you put on the wrong answer: >>Kip I think you are wrong. Obviously he does not fully understand the IN as IN is not that same as OR. If he did what did he say "Then WHERE x IN (1,2,3) is a better alternative to WHERE x = 1 OR x = 2 OR x = 3, right?"<< The fact is "X IN (1,2,3)" *IS* effectively the same as "X=1 OR X=2 OR X=3". If you say that they are not, in what circumstance would they give different query results? there may be differences in the database engine's implementation, but there's no difference from a black-box perspective.
Kip
+1  A: 

No, it's not a trick. The two statements:

SELECT * FROM pages WHERE is_visible IN ($visibility)
SELECT * FROM pages WHERE is_visible = $visibility

are nearly equivalent. We observe that the two statements are equivalent in the trivial case, for example, when $visibility is a scalar with a value of 1.

But the statements are not equivalent in the non-trivial cases when $visibility contains something else. We can observe a significant difference in behavior of the two forms. Consider what happens with each form when $visibility is a string containing these example values:

    '1,2,3' 
    '1 OR 1=1'
    'select v.val from vals v'

We observe a significant difference in the resultant SQL statements generated from the two forms:

    SELECT * FROM pages WHERE is_visible IN (1,2,3)
    SELECT * FROM pages WHERE is_visible = 1,2,3
    SELECT * FROM pages WHERE is_visible IN (1 OR 1=1 )
    SELECT * FROM pages WHERE is_visible = 1 OR 1=1


A large concern here, with either form of the statement, is the potential for SQL injection. If $visibility is intended to be a scalar value, then using a bind variable in the statement is a more secure approach, since it avoids anyone from sliding 'extra' SQL syntax into the statement. (Of course, using bind variables doesn't prevent all SQL injection, but it is suitable approach to closing one hole. Using a bind variable will also improve scalability, at least on some DBMS platforms such as Oracle.)

Consider what happens when we use a bind variable (placeholder), which we know will NOT be interpreted as SQL syntax. We observe that the two statements ARE indeed equivalent:

    SELECT * FROM pages WHERE is_visible IN ( ? )
    SELECT * FROM pages WHERE is_visible = ? 

for any value supplied for the bind variable.

HTH

spencer7593
You'd still have the possibility of SQL injection with IN or with equals, this answer is irrelevant to the question.
Kip
Kip, you are correct. That's what I was trying to point out. The original question asked whether they were "missing something". You seem to have missed it as well. Neither variant prevents SQL injection, that is most easily addressed using a bind variable. In that case, the statements WOULD be equivalent, even when a NULL value is supplied. If you aren't going to bother using a bind variable, then the statements are _SIGNIFICANTLY_ different in that they are vulnerable in different ways, and we aren't given ANY information as to what $visibility contains. My answer IS relevant.
spencer7593
+2  A: 

There is no trick here to boost performance, but I don't think there is any performance hit either. So both ways are OK. I agree with you that using an IN clause with only one parameter looks funny, but it is still easy enough to follow.

I think this probably happens because a developer thinks it is likely that multiple values are allowed in the future (this way the query wouldn't be changed). Or possibly the developer is used to scenarios where multiple values are possible, and they just wrote it that way out of habit.

Kip
Kip I think you are wrong. Obviously he does not fully understand the IN as IN is not that same as OR. If he did what did he say "Then WHERE x IN (1,2,3) is a better alternative to WHERE x = 1 OR x = 2 OR x = 3, right?"
Lee
Thanks Kip. I think all of the answers to the question are great, but yours answers the question itself :)
searbe
@Lee: your comment belonged on your answer, not this one, as the comment doesn't relate to this answer.
Kip
One other quick note on style, using the IN avoids having to 1) wrap the OR list in parentheses (order of precedence), and avoids having to reference the column name multiple times. Of most import here is that we don't know (from the context) what the value $visibility is. A bind variable would tell us that the database is going to see it as a scalar, whatever it contains.
spencer7593
A: 

Then WHERE x IN (1,2,3) is a better alternative to WHERE x = 1 OR x = 2 OR x = 3, right? But surely this is the only scenario where you should use IN?

IN () and OR are syntactically equivalent. Examining the execution plan for your two suggestions will show this. IN() simply is a more efficient and easy-to-understand notation.

Emtucifor