views:

320

answers:

3
+2  Q: 

SQL Index Seek

I've got an odd thing going on with a query. I have a table with two bigints, StartNum and EndNum that defines number ranges. I have an index on these two columns. The query takes a given number and returns the range it falls into. The WHERE clause is where @Num >= StartNum and @Num <= EndNum.

There's a big difference in the Profiler between searching for a number near the start of the list and one near the end. The further down the list the number is, the more Reads and Duration it takes. The query plan shows that it's using a seek on my index.

As an index is a balanced tree, surely there shouldn't be that much difference. Can anybody explain this to me, please?

Small print: This is on SQL 2005 Workgoup edition. There are about 200,000 rows in the table. The index is nonclustered (the clustered index is on an identity column but the data was inserted in StartNum order). The index has 716 pages, has a depth of 3, is 3% fragmented.

+1  A: 

It depends on your query. If you are only querying for these two values, e.g.:

SELECT StartNum, EndNum
FROM Books
WHERE (@Num >= StartNum) AND (@Num <= EndNum)

In this case, SQL Server only has to seek through the two indexes to return the numbers. Your two indexes contain the value you're indexed, and because it's an index they're sorted.

But i'm sure you're actually including other columns in your query:

SELECT BookID, Title, IDBN, Author, StartNum, EndNum
FROM Books
WHERE (@Num >= StartNum) AND (@Num <= EndNum)

In this case, SQL Server, once it has found the id's of the rows that match the criteria, must then go back into the database and find those rows so that it can return you the:

  • Title
  • ISBN
  • Author

in addition to the values it already has from the two indexes:

  • StartNum
  • EndNum
  • BookID

Note: An index on StartNum implicily contains the value of the clustered key, since that's how it knows what row corresponds to an entry in the index.

The problem is that if there are "too many books" that it has to go look up in the table, then it might just be faster to read the entire table top to bottom.


Analogy: You look in the index of a book for all references to "design patterns".

design patterns: 4, 89, 221, 442

If there's only 4 entries, then you'd be okay with flipping backwards to the page listed in the index. This is called a bookmark lookup.

But what if the index said there was 827 references to a phrase?

computer: 1, 2, 6, [snip 825 entries], 1087, 1128

Then it might just be faster to read through the book to find them yourself. In this case we give up and scan the entire book. SQL Server calls this a clustered index scan, if the table has a clustered index, or a table scan if there is no clustered index (i.e. it's a "heap table")


If you were simply referencing StartNum and EndNum in your query (let say you were getting a count), then i'm sure you'll see consistently low reads and execution times. But if you include other columns, and SQL Server thinks that there is too many rows that will be returned from that query (more than 5% of the table, for example), then it is just going to forget the index and scan the entire table.

There is a crossover point, where SQL Server knows the distribution of StartNum and EndNum values in your table, because it samples the values and has statistics on their distribution. If some values of @Num will happen to return few rows, and SQL Server knows this, it will perform the relativly few bookmark lookups. But if your data distribution would cause more rows to return, then you'll get the clustered index scan.

Ian Boyd
Thanks, Ian, but it really is as simple as I said. I started with more going on so I stripped it all back to just selecting the columns from the index. There are no bookmark lookups or anything else. Just an Index Seek on my index. Still getting widely varying Reads and Duration depending on how far down the 'list' the sought number falls.
David Wimbush
A: 

Are you indexed on StartNum ASC, EndNum DESC?

Because I expect it's not able to do two seeks if you are indexed on StartNum ASC, EndNum ASC. It has to do a seek and a scan.

Another possibility is to index separately on StartNum and EndNum (order wouldn't matter) and see if it will use both indexes in sequence and then do some kind of join.

You can't really tell until you look at the differing execution plans.

I'm having trouble duplicating the behavior without the real columns. I set up a test table with 200000 StartNum-EndNum ranges 100-199, 200-299, etc. I've been able to get it to do a clustered index scan, table scan, and a non-clustered index seek, but nothing close to what you are talking about.

Cade Roux
Thanks for that, Cade. The index is StartNum asc, EndNum asc and it's literally just a single index seek in the query plan. I tried asc, desc but that didn't help. I tried separate indexes and then it used a seek on the StartNum index combined with a table scan to find those <= EndNum.
David Wimbush
A: 

Cracked it!

EndNum is always >= StartNum so I just need to find the biggest StartNum that is <= @Num. I changed the index to StartNum desc and the query to top 1 ... where StartNum <= @Num order by StartNum desc. Now for any value of @Num it's just an index seek with 5 reads and 0 duration, which is just what I was after.

Thanks for the help, guys.

David Wimbush