tags:

views:

1117

answers:

5

I am trying to use parameter substitution with SQLite within Python for an IN clause. Here is a complete running example that demonstrates:

import sqlite3

c = sqlite3.connect(":memory:")
c.execute('CREATE TABLE distro (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)')

for name in 'Ubuntu Fedora Puppy DSL SuSE'.split():
  c.execute('INSERT INTO distro (name) VALUES (?)', [ name ] )

desired_ids = ["1", "2", "5", "47"]
result_set = c.execute('SELECT * FROM distro WHERE id IN (%s)' % (", ".join(desired_ids)), ())
for result in result_set:
  print result

It prints out:

(1, u'Ubuntu') (2, u'Fedora') (5, u'SuSE')

As the docs state that "[y]ou shouldn’t assemble your query using Python’s string operations because doing so is insecure; it makes your program vulnerable to an SQL injection attack," I am hoping to use parameter substitution.

When I try:

result_set = c.execute('SELECT * FROM distro WHERE id IN (?)', [ (", ".join(desired_ids)) ])

I get an empty result set, and when I try:

result_set = c.execute('SELECT * FROM distro WHERE id IN (?)', [ desired_ids ] )

I get:

InterfaceError: Error binding parameter 0 - probably unsupported type.

While I hope that any answer to this simplified problem will work, I would like to point out that the actual query I want to perform is in a doubly-nested subquery. To wit:

UPDATE dir_x_user SET user_revision = user_attempted_revision 
WHERE user_id IN 
    (SELECT user_id FROM 
     (SELECT user_id, MAX(revision) FROM users WHERE obfuscated_name IN 
      ("Argl883", "Manf496", "Mook657") GROUP BY user_id
     ) 
    )
+2  A: 

Update: this works:

import sqlite3

c = sqlite3.connect(":memory:")
c.execute('CREATE TABLE distro (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)')

for name in 'Ubuntu Fedora Puppy DSL SuSE'.split():
  c.execute('INSERT INTO distro (name) VALUES (?)', ( name,) )

desired_ids = ["1", "2", "5", "47"]
result_set = c.execute('SELECT * FROM distro WHERE id IN (%s)' % ("?," * len(desired_ids))[:-1], desired_ids)
for result in result_set:
  print result

The issue was that you need to have one ? for each element in the input list.

The statement ("?," * len(desired_ids))[:-1] makes a repeating string of "?,", then cuts off the last comma. so that there is one question mark for each element in desired_ids.

Mark Rushakoff
That was a great explanation. Thank you.
Clinton Blackmore
+1  A: 

You do need the right number of ?s, but that doesn't pose a sql injection risk:

>>> result_set = c.execute('SELECT * FROM distro WHERE id IN (%s)' %
                           ','.join('?'*len(desired_ids)), desired_ids)
>>> print result_set.fetchall()
[(1, u'Ubuntu'), (2, u'Fedora'), (5, u'SuSE')]
Alex Martelli
+1 for the "best" solution to generate the placeholder-list string :-)
Ferdinand Beyer
+1  A: 

I always end up doing something like this:

query = 'SELECT * FROM distro WHERE id IN (%s)' % ','.join('?' for i in desired_ids)
c.execute(query, desired_ids)

There's no injection risk because you're not putting strings from desired_ids into the query directly.

John Fouhy
The values I'll be using in the IN clause actually come from a file exported from another system. I expect that the risk of injection is miniscule, but you never know when Bobby Tables will show up.
Clinton Blackmore
The risk of injecton is 0 because the only thing you're programatically putting into your query is a bunch of question marks. All a hypothetical attacker can do is control the number of question marks -- that's not an attack vector. The actual externally-supplied data is going through the ? parameter-passing mechanism as usual.
John Fouhy
I see. Yes, you are right.
Clinton Blackmore
A: 

In case sqlite has problem with the length of sql request the indefinite number of question marks can be some kind of way to beak things.

n800s
+1  A: 

According to http://www.sqlite.org/limits.html (item 9), SQLite can't (by default) handle more than 999 parameters to a query, so the solutions here (generating the required list of placeholders) will fail if you have thousands of items that you're looking IN. If that's the case, you're going to need to break up the list then loop over the parts of it and join up the results yourself.

If you don't need thousands of items in your IN clause, then Alex's solution is the way to do it (and appears to be how Django does it).

cibyr
Good to know. Thanks. I may even have to revist my code.
Clinton Blackmore