ansaurus

Question

SQL huge selection of IDs - How to make it faster?

Answer 1

A:

Adding recompile not a good idea. Precompiling means sql does not save your query results but it saves the execution plan. Thereby trying to make the query faster. If you add recompile then it will have the overhead of compiling the query always. Try creating a stored procedure and saving the query and calling it from there. As stored procedures are always precompiled.

zapping 2010-02-02 08:05:21

You miss the point.The recompile is used to FORCE the query cache to NOT store it. A query like this will easily take up ½ megabyte of memory. And it makes 1 query cache for every time you do this with a different length in the IN. Meaning if you do 6 selection of, first 300 ids, then 301, then 302, 500, 600, 800, you will have used perhaps 6MB of SQL server memory for nothing... Actually making the query plan for this super trivial query takes sql server almost zero time since there is usually only a single index to consider, and there are no joins.

Cine 2010-02-02 08:10:10

http://blogs.msdn.com/queryoptteam/archive/2006/03/31/565991.aspx

zapping 2010-02-02 08:17:07

Are you sure that the recompile makes the query not to cache or query plan not to cache? And how do you get the ids? And are they constant always or do they change?

zapping 2010-02-02 08:25:07

Yes. Try it yourself.DBCC FREEPROCCACHE; select * from table where id in (1); select * from table where id in (1,2); select * from table where id in (1,2,3) OPTION(RECOMPILE); SELECT sqlt.TEXT AS 'Cached query', qstat.execution_count, cplan.size_in_bytes, qstat.last_worker_time, qstat.max_worker_time, qstat.last_execution_timeFROM sys.dm_exec_cached_plans cplanINNER join sys.dm_exec_query_stats qstat ON cplan.plan_handle = qstat.plan_handlecross apply sys.dm_exec_sql_text(qstat.sql_handle) sqlt option(recompile). The last one with 3 ids wont be in the query cache

Cine 2010-02-02 08:40:43

Answer 2

+1 A:

If the list of Ids were in another table that was indexed, this would execute a whole lot faster using a simple INNER JOIN

if that isn't possible then try creating a TABLE variable like so

DECLARE @tTable TABLE
(
   @Id int
)

store the ids in the table variable first, then INNER JOIN to your table xxx, i have had limited success with this method, but its worth the try

Neil 2010-02-02 08:09:24

@tTable idea: That would work... If it wasn't for the fact that you would need 5000 INSERTS to put it in the temp table in the first place. The result is EVEN slower than anything else.

Cine 2010-02-02 08:12:49

INNER JOIN idea: Yes, this is also useful. However, it is only valid when either 1. you actually have it in a table, or 2. The speed of selecting those IDs is low or don't actually need the ids for anything.

Cine 2010-02-02 08:15:07

Answer 3

A:

Another dirty idea similar to Neils,

Have a indexed view which holds the IDs alone based on your business condition
And you can join the view with your actual table and get the desired result.

Ramesh Vel 2010-02-02 08:37:32

If there was a constant number of IDs then it situation would be alot simpler.It is possible to make an indexed view, defined as (select 12 union all select 13 etc), but 1. that will require you to take a schema modification lock (which would block ALL other threads in your DB) and 2. you would need to build the index on the view and 3. The creation query of the view would be even longer than the IN query, thus the parser would take even longer.

Cine 2010-02-02 08:45:38

Answer 4

+1 A:

You're using (key > max+MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH) as the check to determine whether to do a range fetch instead of an individual fetch. It appears that's not the best way to do that.

let's consider the 4 ID sequences {2, 7}, {2,8}, {1,2,7}, and {1,2,8}. They translate into

ID BETWEEN 2 AND 7
ID ID in (2, 8)
ID BETWEEN 1 AND 7 
ID BETWEEN 1 AND 2 OR ID in (8)

The decision to fetch and filter the IDs 3-6 now depends only on the difference between 2 and 7/8. However, it does not take into account whether 2 is already part of a range or a individual ID.

I think the proper criterium is how many individual IDs you save. Converting two individuals into a range removes has a net benefit of 2 * Cost(Individual) - Cost(range) whereas extending a range has a net benefit of Cost(individual) - Cost(range extension).

MSalters 2010-02-02 09:14:46

I agree, it is not optimal. I dont agree with your cost calculation though. Selecting a range of 5 has almost the same cost as selecting 1, since the execution of it has to go down in the BTREE index and the overhead of doing that is rather high. For this particular table I am working on, it has a tiny row size, so I could extend the range to 50 and the time it takes to execute the query stays about the same. But then I get rather many results.

Cine 2010-02-02 11:22:48

Answer 5

A:

The efficient way to do this is to:

Create a temporary table to hold the IDs
Call a SQL stored procedure with a string parameter holding all the comma-separated IDs
The SQL stored procedure uses a loop with CHARINDEX() to find each comma, then SUBSTRING to extract the string between two commas and CONVERT to make it an int, and use INSERT INTO @Temporary VALUES ... to insert it into the temporary table
INNER JOIN the temporary table or use it in an IN (SELECT ID from @Temporary) subquery

Every one of these steps is extremely fast because a single string is passed, no compilation is done during the loop, and no substrings are created except the actual id values.

No recompilation is done at all when this is executed as long as the large string is passed as a parameter.

Note that in the loop you must tracking the prior and current comma in two separate values

RobC 2010-02-02 09:28:07

Unfortunately the time it takes to parse the string is very high. There are 3 basic ways to do it. 1. as you describe with CHARINDEX etc, this is extremely slow. 2. Use XML: CREATE FUNCTION dbo.SplitIdXml( @idslist xml)returns tableASRETURN (select T.c.value('.', 'int') as Id from @idslist.nodes('/b') AS T(c))GOselect * from dbo.SplitIdXml('<b>123</b><b>124</b>'). This has a decent speed, but time trials show that it is still ~3 times slower than the BETWEEN solution. 3. Use SQLCLR to parse the string, this ought to be fast, but there is some overhead in table valued functions so slow.

Cine 2010-02-02 11:09:52

Answer 6

+1 A:

In my experience the fastest way was to pack numbers in binary format into an image. I was sending up to 100K IDs, which works just fine:

Mimicking a table variable parameter with an image

Yet is was a while ago. The following articles by Erland Sommarskog are up to date:

Arrays and Lists in SQL Server

AlexKuznetsov 2010-02-02 15:22:33

That last link is really useful!

Cine 2010-02-04 02:09:16

Answer 7

A:

Off the cuff here - does incorporating a derived table help performance at all? I am not set up to test this fully, just wonder if this would optimize to use between and then filter the unneeded rows out:

Select * from 
( SELECT *
  FROM dbo.table 
  WHERE ID between <lowerbound> and <upperbound>) as range
where ID in ( 
    1206,
    1207,
    1208,
    1209,
    1210,
    1211,
    1212,
    1213,
    1214,
    1215,
    1216,
    1217,
    1218,
    1219,
    1220,
    1221,
    1222,
    1223,
    1224,
    1225,
    1226,
    1227,
    1228,
    <...>,
    1230,
    1231
)

onupdatecascade 2010-02-02 17:20:58

In your form, no. Any decent sql query optimizer will notice that it can push the ID selection into the inner query, and also eliminate the useless derived table. You could FORCE it to do what you want, by adding a TOP 99999999 to the inner table, but the result is much slower.

Cine 2010-02-04 01:51:52

Ah, well, it was a stab. I thought a "decent sql query optimizer" might find that there are two conditions (in and between), and perhaps note that the range scan for between was less expensive, and process that first. Long shot I guess.

onupdatecascade 2010-02-04 17:59:07

ansaurus

tags:

views:

answers:

SQL huge selection of IDs - How to make it faster?

related questions