ansaurus

Question

Selecting a large number of rows given the primary key

Answer 1

+7 A:

Create a Temp table with the values you want to select and perform a join from your main table to the temp table. This way you have practically no limit.

Otávio Décio 2009-01-18 15:51:25

could also use a table variable here too

Russ Cam 2009-01-18 17:26:57

Answer 2

+1 A:

As user ocdecio correctly proposes, you could join to a tempory table which would contain all IDs. You could also try to split up the ORs into different UNION parts:

SELECT * FROM Table WHERE [PrimaryKey] BETWEEN 151 AND 217
UNION
SELECT * FROM Table WHERE [PrimaryKey] BETWEEN 314 AND 378
UNION
...
UNION
SELECT * FROM Table WHERE [PrimaryKey] IN (1,3,7,14,147...)

splattne 2009-01-18 16:00:44

Answer 3

+1 A:

I can understand the intention of your mechanism. In practice, however, the presence of an index and faith in the optimiser is usually better.

If you were to create a dozen conditions in the WHERE clause, the query engine needs to check each one of them until it finds a match.

Equally, creating multiple queries and unioning them would mean index scans or index lookups on the table multiple times.

It is true, however, that an IN clause can get very slow when you have a large list. In that case using a join is usually faster. From experience, this is always my preferred option.

Yet it is true that the use of BETWEEN is particulary efficient for larger ranges. With that in mind it may be beneficial to use the UNION mechanism WHERE the first recorset uses a JOIN and the rest use BETWEEN, provided that the BETWEEN is of a significant range.

A significant consideration is the time to prepare such a query. If the SQL Server has to generates a dynamic query, using T-SQL, there will be two over heads; The time to generate the query, and the time to parse it then generate the execution plan. The first would dominate for large lists, and may cost more time than is saved by using BETWEEN.

If the client generates the dynamic query, you could argue that you are transfering some load from the server to the client. Though I am still sceptical that the benefits would be significant.

Ass such, unless I saw very noticable performance increases, I would stick to the join. The principle reasons being ones of engineering;
- Time to develop
- Reliability of code (simple is always more reliable the clever tricks)
- Maintainability of code (will subsequent maintainers understand the trick?)

If you do test various combinations of JOIN and BETWEEN, with or without UNION, etc, I'd be very interested to see your performance results.

Dems 2009-01-18 16:55:07

Answer 4

A:

This is a similar problem to what I recently faced. I needed to select from a table those rows that intersected a large list of primary keys. The question was how to efficently send the large list of keys to the SQL server in a form that is fast and efficent to use.

What works really well for this type of situation is the XML data type. If you create a stored procedure that takes one parameter of type XML, you can pre-format the input into a XML fragment. As an example, let's say the XML fragment will look like this:

<a>
  <b>1</b>
  <b>3</b>
  <b>7</b>
  <b>14</b>
  <b>147</b>
</a>

I gave the elements short names ("a" and "b") because a longer name would mean more bytes to transmit from the client to the SQL server. Here is how you would select all the contents of the "b" elements as a record set:

declare @x xml
set @x = '<a><b>1</b><b>3</b><b>7</b><b>14</b><b>147</b></a>'
select t.item.value('.', 'int') from @x.nodes('//a/b') as t(item)

Although the syntax is cryptic, the XML type can be queried just like a table. Now you should see where this is going. If we can query the XML type like a table, we can intersect that with another table:

select * from MyTable where ID in 
  (select t.item.value('.', 'int') from @x.nodes('//a/b') as t(item))

or using a join

;with cte as
  (select ID = t.item.value('.', 'int') from @x.nodes('//a/b') as t(item)) 
select * from MyTable inner join cte on MyTable.ID = cte.ID

You need to run both versions to see which will be faster for your data. I find the JOIN works faster with my data. Here is a stored procedure that takes the XML type as input and spits back our selected rows:

create procedure MyProc @x xml as
begin
  set nocount on
  ;with cte as
    (select ID = t.item.value('.', 'int') from @x.nodes('//a/b') as t(item)) 
  select * from MyTable inner join cte on Table.ID = cte.ID
end

A sample call of the new stored procedure:

exec MyProc '<a><b>1</b><b>3</b><b>7</b><b>14</b><b>147</b></a>'

I also found that adding a XML schema for the input fragment helped to speed up the stored procedure slightly. I won't go into the details of XML schemas here, but the idea is to tell SQL beforehand what the XML fragment will look like. Here's how we would input our schema:

create xml schema collection MyInputSchema as
  '<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;
    <xsd:element name="a">
      <xsd:complexType>
        <xsd:complexContent>
          <xsd:restriction base="xsd:anyType">
            <xsd:sequence>
              <xsd:element name="b" type="xsd:integer" maxOccurs="unbounded" />
            </xsd:sequence>
          </xsd:restriction>
        </xsd:complexContent>
      </xsd:complexType>
    </xsd:element>
  </xsd:schema>'

Now we can associate this schema with the input to our stored procedure like this:

create procedure MyProc @x xml(MyInputSchema) as
begin
  set nocount on
  ;with cte as
    (select ID = t.item.value('.', 'int') from @x.nodes('//a/b') as t(item)) 
  select * from MyTable inner join cte on Table.ID = cte.ID
end

With all this in place, I was able to send a XML fragment of 43,016 characters from my client machine to the SQL server and get back a result set very quickly. I made a test of 1,000 requests on 10 threads for a total of 10,000 requests. The result was 72 requests processed per second. Of course your millage will vary depending on your hardware and software.

NOTE: This code works on SQL 2005 and should also work on 2008.

Charles 2009-03-17 15:00:07

Answer 5

A:

How do you determine the list of primary key values to select on? I am wondering if we should look at this further 'upstream' - is there some earlier query or queries you ran to arrive at the list of key's? If so, perhaps you could do the join from that and skip the key-lookup altogether.

n8wrl 2009-03-17 15:05:47

Answer 6

A:

My first thought would be to look at how you are pulling in the list of keys. Is it from a query elsewhere? If so, have you tried

SELECT * FROM Table WHERE [PrimaryKey] in
(
    SELECT PrimaryKey from SomeOtherTable where Condition = 'met'
)

or if your conditions are in the same table, it's even simpler

SELECT * FROM Table WHERE condition = 'met'

Of course, if your keys are all coming from another source (your app, maybe), then I personally would stick with one of the previously suggested methods, or maybe bite the bullet and use one big IN clause.

BTW - sorry if I'm teaching anyone to suck eggs, it's just that my experience has been that sometimes the simplest solutions get overlooked.

ZombieSheep 2009-03-17 15:15:07

ansaurus

tags:

views:

answers:

Selecting a large number of rows given the primary key

related questions