We have an interesting problem that I was hoping someone could help to shed some light on. At a high level the problem is as below:
The following query executes quickly (1 second):
SELECT SA.* FROM cg.SEARCHSERVER_ACTYS AS SA JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID
but if we add a filter to the query, then it takes approximately 2 minutes to return:
SELECT SA.* FROM cg.SEARCHSERVER_ACTYS AS SA JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID WHERE SA.CHG_DATE>'19 Feb 2010'
Looking at the execution plan for the two queries, I can see that in the second case there are two places where there are huge differences between the actual and estimated number of rows, these being:
1) For the FulltextMatch table valued function where the estimate is approx 22,000 rows and the actual is 29 million rows (which are then filtered down to 1670 rows before the join) and 2) For the index seek on the full text index, where the estimate is 1 row and the actual is 13,000 rows
As a result of the estimates, the optimiser is choosing to use a nested loops join (since it assumes a small number of rows) hence the plan is inefficient.
We can work around the problem by either (a) parameterising the query and adding an OPTION (OPTIMIZE FOR UNKNOWN) to the query or (b) by forcing a HASH JOIN to be used. In both of these cases the query returns in sub 1 second and the estimates appear reasonable.
My question really is 'why are the estimates being used in the poorly performing case so wildly inaccurate and what can be done to improve them'?
Statistics are up to date on the indexes on the indexed view being used here.
Any help greatly appreciated.