tags:

views:

95

answers:

3

Hello all,

I have a very large db that I am working with, and I need to know how to select a large set of id's which doesn't have any real pattern to them. This is segment of code I have so far:

longIdList = [1, 3, 5 ,8 ....................................]
for id in longIdList
    sql = "select * from Table where id = %s" %id
    result = cursor.execute(sql)
    print result.fetchone()

I was thinking, That there must be a quicker way of doing this... I mean my script needs to search through a db that has over 4 million id's. Is there a way that I can use a select command to grab them all in one shot. could I use the where statement with a list of id's? Thanks

+2  A: 

You can use IN to look for multiple items simultaneously:

SELECT * FROM Table WHERE id IN (x, y, z, ...)

So maybe something like:

sql = "select * from Table where id in (%s)" % (', '.join(str(id) for id in longIdList))
John Kugelman
Depending on the DBMS, this can have disastrous performance results when the list gets long. SQL Server, IIRC, will turn `IN` clauses into a massive set of `OR`s.
ngroot
I've used `IN` for the same thing before. If you're using MySQL anyway I think you'll run into a wall for the maximum query length before you run into performance issues. I'd use the python `time` library to benchmark what you have and see how that compares with running them ~1000 at a time using `IN`.
Adam Fraser
@Adam Fraser. I will have to try that
Richard
+2  A: 

Serialize the list in some fashion (comma-separated or XML would be reasonable choices), then have a stored procedure on the other side that will deserialize the list into a temp table. You can then do an INNER JOIN against the temp table.

ngroot
+4  A: 

Yes, you can use SQL's IN() predicate to compare a column to a set of values. This is standard SQL and it's supported by every SQL database.

There may be a practical limit to the number of values you can put in an IN() predicate before it becomes too inefficient or simply exceeds a length limit on SQL queries. The largest practical list of values depends on what database you use (in Oracle it's 1000, MS SQL Server it's around 2000). My feeling is that if your list exceeds a few dozen values, I'd seek another solution.

For example, @ngroot suggests using a temp table in his answer. For analysis of this solution, see this blog by StackOverflow regular @Quassnoi: Passing parameters in MySQL: IN list vs. temporary table.

Parameterizing a list of values into an SQL query a safe way can be tricky. You should be mindful of the risk of SQL injection.

Also see this popular question on Stack Overflow: Parameterizing a SQL IN clause?

Bill Karwin
Yeah, I have to pass a list of 321718 ids. I am joining two databases together if their timestamps match... I can code this... just that it takes forever to do.
Richard
You don't say what brand of database you're using, but many databases allow cross-database joins, as long as the databases are on the same server instance.
Bill Karwin