views:

61

answers:

2

I am running two very similar update queries but for a reason unknown to me they are using completely different execution plans. Normally this wouldn't be a problem but they are both updating exactly the same amount of rows but one is using an execution plan that is far inferior to the other, 4 secs vs 2 mins, when scaled up this is causing me a massive problem.

The only difference between the two queries is one is using the column CLI and the other DLI. These columns are exactly the same datatype, and are both indexed exactly the same, but for the DLI query execution plan, the index is not used.

Any help as to why this is happening is much appreciated.

-- Query 1
UPDATE a
 SET DestKey = ( 
 SELECT TOP 1 b.PrefixKey 
 FROM refPrefixDetail AS b 
 WHERE a.DLI LIKE b.Prefix + '%' 
 ORDER BY len(b.Prefix) DESC )
FROM CallData AS a

-- Query 2
UPDATE a
 SET DestKey = ( 
 SELECT TOP 1 b.PrefixKey 
 FROM refPrefixDetail b 
 WHERE a.CLI LIKE b.Prefix + '%' 
 ORDER BY len(b.Prefix) DESC )
FROM CallData AS a
+1  A: 

Try updating your statistics. If that does not help try rebuilding your indexes. It is possible that the cardinality of the data in each column is quite different, causing different execution plans to be selected.

RedFilter
I have delete my indexes and recreated, I have rebuilt the indexes, updated the statistics and truncated and reinserted the data. None of these has any effect on the query execution plans. :-(
Graham Clements
+3  A: 

Examine the statistics on these two columns on the table (How the data values for the columns are distributed among all the rows). This will propbably explain the difference... One of these columns may have a distribution of values that could cause the query, in processsing, to need to examine a substantially higher number of rows than would be required by the other query, (The number or rows updated is controlled by the Top 1 part remember) then it is possible that the query optimizer will choose not to use the index... Updating statistics will make them more accurate, but if the distribution of values is such that the optimizer chooses not to use the index, then you may be out of luck...

Understanding how indices work is useful here. An index is a tree-structure of nodes, where each node (starting with a root node) contains information that allows the query processor to determine which branch of the tree to go to next, based on the value it is "searching" for. It is analogous to a binary-Tree except that in databases the trees are not binary, at each level there may be more than 2 branches below each node.

So, for an index, to traverse the index, from the root to the leaf level, requires that the processor read the index once for each level in the index hiearchy. (if the index is 5 levels deep for example, it needs to do 5 I/O operations for each record it searches for.

So in this example, say, if the query need to examine more than approximately 20% of the records in the table, (based on the value distribution of the column you are searching against), then the query optimizer will say to itself, "self, to find 20% of the records, with five I/O s per each record search, is equal to the same number of I/Os as reading the entire table.", so it just ignores the index and does a Table scan.

There's really no way to avoid this except by adding additonal criteria to your query to furthur restrict the number of records the query must examine to generate it's results....

Charles Bretana
Thanks for your prompt response.I have examined the statistics for both these indexes, I'm not really sure what I'm looking for here, any suggestions? There are some differences. Is there anything I can do about this, for example is there any way of forcing the optimiser to use an index as the route it seems to be taking is far less efficient. It seems pointless me having the index if it is not improving my query.
Graham Clements
Understanding how the index works is useful here. I am editing my answer to add this....
Charles Bretana
Thanks for the explaination, that certainly help me understand why this is happening. Unfortunately I cannot limit the result any further so I guess I am going to have to live with this, as from what you are saying it is in fact the most efficiant way.
Graham Clements
ANother option (should you have this freedom) is to modify / optimize the schema of your table - change CallData so that the prefix portion of the DlI and CLI columns are stored separately... Then you could potentially put an indexed view on CallData on that Prefix column and that would speed things up substantially
Charles Bretana