views:

140

answers:

4

I have a transaction table of 1 million rows. The table has a field name "Code" to keep customer's ID. There are about 10,000 different customer code.

I have an GUI interface allow user to render a report from transaction table. User may select arbitrary number of customers for rendering.

I use IN operator first and it works for few customers:

SELECT * FROM TRANS_TABLE WHERE CODE IN ('...', '...', '...')

I quickly run into problem if I select few thousand customers. There is limitation using IN operator.

An alternate way is create a temporary table with only one field of CODE, and inject selected customer codes into the temporary table using INSERT statement. I may then using

SELECT A.* FROM TRANS_TABLE A INNER JOIN TEMP B ON (A.CODE=B.CODE)

This works nice for huge selection. However, there is performance overhead for temporary table creation, INSERT injection and dropping of temporary table.

Do you aware of better solution to handle this situation?

A: 

As you have to pass the IDs somehow, IN should be the fastest way.

MSDN mentions:

Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table.

If you still can use IN and the query is to slow, you could try to adjust your indexes like using some covering index for your query. Looking up random values by the clustered index can be slow, because of the random disk I/O required. A covering index could reduce that problem.

If you really pass limit of IN and you create a temporary table, I don't expect the creation of the table to be a major problem, as long as you insert the values at once (not thousands of queries of course). Choose the method with the least overhead, like one of those mentioned here:

http://blog.sqlauthority.com/2008/07/02/sql-server-2008-insert-multiple-records-using-one-insert-statement-use-of-row-constructor/

Of course, if there is some static pattern in your IDs you could select by that (like in SPs or UDFs). If you get those thousands of IDs out of your database itself, instead of passing them back and forth, you could just store them or use a subquery...

Wikser
My applications work with different kind of database ranges from Firebird, MSSQL and MySQL. These database may have different IN list limit.
Chau Chee Yang
A: 

Maybe you could pass the customer codes to a stored procedure comma separated and use the split sql function mentioned here: http://www.devx.com/tips/Tip/20009.

Then declare a scalar table where you insert the splitted values in and use an IN clause.

CREATE PROCEDURE prc_dosomething (
    @CustomerCodes varchar(MAX)
)
AS

DECLARE @customercodetable table(code varchar(10)) -- or whatever length you require.
SET @customercodetable = UTILfn_Split(@CustomerCodes) -- see the article above for the split function.

-- do some magic stuff here :).
bartvdvliet
+2  A: 

If you use SQL Server 2008, the fastest way to do this is usually with a Table-Valued Parameter (TVP):

CREATE TYPE CodeTable AS TABLE
(
    Code int NOT NULL PRIMARY KEY
)

DECLARE @Codes AS CodeTable
INSERT @Codes (Code) VALUES (1)
INSERT @Codes (Code) VALUES (2)
INSERT @Codes (Code) VALUES (3)
-- Snip codes

SELECT t.*
FROM @Codes c
INNER JOIN Trans_Table t
    ON t.Code = c.Code

Using ADO.NET, you can populate the TVP directly from your code, so you don't need to generate all those INSERT statements - just pass in a DataTable and ADO.NET will handle the rest. So you can write a Stored Procedure like this:

CREATE PROCEDURE GetTransactions
    @Codes CodeTable READONLY
AS

SELECT t.*
FROM @Codes c
INNER JOIN Trans_Table t
    ON t.Code = c.Code

... and just pass in the @Codes value as a parameter.

Aaronaught
A: 

You can generate SQL such as

SELECT * FROM TRANS_TABLE WHERE CODE IN (?,?,?,?,?,?,?,?,?,?,?)

and re-use it in a loop until you've loaded all the IDs you need. The advantage is that if you only need a few IDs your DB doesn't need to parse all those in-clauses. If many IDs is a rare case then the performance hit may not matter. If you are not worried about the SQL parsing cache then you can limit the size of the in clause to the DB's actual limit, so that sometimes you don't need a loop and other times you do.

Mr. Shiny and New