views:

72

answers:

2

We have an interesting problem that I was hoping someone could help to shed some light on. At a high level the problem is as below:

The following query executes quickly (1 second):

SELECT SA.* FROM cg.SEARCHSERVER_ACTYS AS SA JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID

but if we add a filter to the query, then it takes approximately 2 minutes to return:

SELECT SA.* FROM cg.SEARCHSERVER_ACTYS AS SA JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID WHERE SA.CHG_DATE>'19 Feb 2010'

Looking at the execution plan for the two queries, I can see that in the second case there are two places where there are huge differences between the actual and estimated number of rows, these being:

1) For the FulltextMatch table valued function where the estimate is approx 22,000 rows and the actual is 29 million rows (which are then filtered down to 1670 rows before the join) and 2) For the index seek on the full text index, where the estimate is 1 row and the actual is 13,000 rows

As a result of the estimates, the optimiser is choosing to use a nested loops join (since it assumes a small number of rows) hence the plan is inefficient.

We can work around the problem by either (a) parameterising the query and adding an OPTION (OPTIMIZE FOR UNKNOWN) to the query or (b) by forcing a HASH JOIN to be used. In both of these cases the query returns in sub 1 second and the estimates appear reasonable.

My question really is 'why are the estimates being used in the poorly performing case so wildly inaccurate and what can be done to improve them'?

Statistics are up to date on the indexes on the indexed view being used here.

Any help greatly appreciated.

A: 

Perhaps you could add some statistics on the column in question - that will help SQL Server make better estimates about both the number of rows and their contents.

What statistics or indexes are currently involved?

rwmnau
The indexes currently being used are: the clustered index on the indexed view, which is on the UNIQUE_ID column (which is being scanned to find dates greater than the passed in date) and then the fulltext index (a seek being performed for items with this unique_id)There exists an index on the indexed view on the CHG_DATE column but this does not appear to be being used.
Paul McLoughlin
I can understand the initial index scan of the clustered index here, but the area I am confused on is the index seek of the fulltext index - this is where the estimate (1 row) and the actual (13,000) confuse me the most.
Paul McLoughlin
+1  A: 

The problem here turned out to be with the version of SQL Server. The problem manifested itself with SQL Server 2008 (no service pack) and was resolved by upgrading to SQL Server 2008 SP1 (and adding CU5). Since we did not test without CU5 installed I cannot determine if the fix came with SP1 or CU5. No matter, the issue is resolved. Morale? Keep your server up to date.

Paul McLoughlin
Interesting. I have exactly the same problem but I have the latest version of 2008 R2. Where the result set is small, it seems to choose the wrong execution plan. Adding OPTION (HASH JOIN) seems to fix the problem, but I don't know enough about it to say for sure. I'm worried that I may be slowing it down unnecessarily where the result set is larger. Could it be something wrong with statistics? What do you reckon?
Tim