views:

444

answers:

3

Hello,

I am using sql 2008 full text search and I am having serious issues with performance depending on how I use Contains or ContainsTable.

Here are sample: (table one has about 5000 records and there is a covered index on table1 which has all the fields in the where clause. I tried to simplify the statements so forgive me if there is syntax issues.)

Scenario 1:

select * from table1 as t1
where t1.field1=90
and   t1.field2='something'
and   Exists(select top 1 * from containstable(table1,*, 'something') as t2 
where t2.[key]=t1.id)

results: 10 second (very slow)

Scenario 2:

select * from table1 as t1
join containstable(table1,*, 'something') as t2 on t2.[key] = t1.id
where t1.field1=90
and   t1.field2='something'

results: 10 second (very slow)

Scenario 3:

Declare @tbl Table(id uniqueidentifier primary key)
insert into @tbl select {key] from containstable(table1,*, 'something')

select * from table1 as t1
where t1.field1=90
and   t1.field2='something'
and  Exists(select id from @tbl as tbl where id=req1.id)

results: fraction of a second (super fast)

Bottom line, it seems if I use Containstable in any kind of join or where clause condition of a select statement that also has other conditions, the performance is really bad. In addition if you look at profiler, the number of reads from the database goes to the roof. But if I first do the full text search and put results in a table variable and use that variable everything goes super fast. The number of reads are also much lower. It seems in "bad" scenarios, somehow it gets stuck in a loop which causes it to read many times from teh database but of course I don't understant why.

Now the question is first of all whyis that happening? and question two is that how scalable table variables are? what if it results to 10s of thousands of records? is it still going to be fast.

Any ideas? Thanks

A: 

I'm going to take a guess here that your issue is the same as on the other thread I linked to. Are you finding the issue arises with multiple word search terms?

If so my answer from that thread will apply.

From http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506240

The most important thing is that the correct join type is picked for full-text query. Cardinality estimation on the FulltextMatch STVF is very important for the right plan. So the first thing to check is the FulltextMatch cardinality estimation. This is the estimated number of hits in the index for the full-text search string. For example, in the query in Figure 3 this should be close to the number of documents containing the term ‘word’. In most cases it should be very accurate but if the estimate was off by a long way, you could generate bad plans. The estimation for single terms is normally very good, but estimating multiple terms such as phrases or AND queries is more complex since it is not possible to know what the intersection of terms in the index will be based on the frequency of the terms in the index. If the cardinality estimation is good, a bad plan probably is caused by the query optimizer cost model. The only way to fix the plan issue is to use a query hint to force a certain kind of join or OPTIMIZE FOR.

So it simply cannot know from the information it stores whether the 2 search terms together are likely to be quite independent or commonly found together. Maybe you should have 2 separate procedures one for single word queries that you let the optimiser do its stuff on and one for multi word search terms that you force a "good enough" plan on (sys.dm_fts_index_keywords might help if you want to do a rough estimate of cardinality yourself).

If you are getting the issue with single word queries this passage from the linked article might apply.

In SQL Server 2008 full-text search we have the ability to alter the plan that is generated based on a cardinality estimation of the search term used. If the query plan is fixed (as it is in a parameterized query inside a stored procedure), this step does not take place. Therefore, the compiled plan always serves this query, even if this plan is not ideal for a given search term.

So you might need to use the RECOMPILE option.

Martin Smith
Thanks Marting for your elaborate response. However, my issue is not with multiple words. As a matter of fact the search on the full text is always extremely fast whether it is one word or multiple words. The issue for me is that the performance degrades dramatically when the full text search is combined with other conditions in the where clause.Since I am going out of the allowed limit for the size of this comment, please see the next comment for the rest of my answer...
Bob
@Bob Still very strange that when you join from your table variable and it essentially has to perform the same thing of joining ID to key that it works fine. How does it do that join - Does it choose a different index - or a different join strategy? Also when you look at the execution plan are the estimated and actual rows reasonably correct for all parts of it?
Martin Smith
+1  A: 

I spent quite sometime on this issue, and based on running many scenarios, this is what I figured out:

if you have Contains or ContainsTable anywhere in your query, that is the part that gets executed first and rather independently. Meaning that even if the rest of the conditions limit your search to only one record, neither Contains nor containstable care about that. So this is like a parallel execution.

Now since fulltext search only returns a Key field, it immediately looks for the Key as the first field of other indexes chosen for the query. So for the example above, it looks for the index with [key], field1, field2. The problem is that it chooses an index for the rest of query based on the fields in the where clause. so for the example above it picks the covered index that I have which is something like field1, field2, Id. (Id of the table is the same as the [Key] returned from the full text search). So summary is:

1) executes containstable 2) executes the rest of the query and pick an index based on where clause of the query 3)It tries to merge these two. therefore, if the index that it picked for the rest of the query starts with the [key] field, it is fine. However, if the index doesnot have the [key] field as the first key, it starts doing loops. it does not even do a table scan, otherwise going through 5000 records would not be that slow. The way it does the loop, is that it runs the loop for the total number of results from FTS multiplied by total number of results from the rest of the query. So if the FTS is returning 2000 records and the rest of the query returns 3000, it loops 2000*3000= 6,000,000. I donot understand why.

So in my case it does the full text search, then it does he rest of the query but picks the covered index that I have which is based on field1, field2,id (which is wrong) and as the result it screws up. If I change my covered index to Id, field1, field2 everything would be very fast.

My expection was that FTS returns bunch of [key], the rest of the query return bunch of [Id] and then the Id should be matched against [key].

Of course, I tried to simplify my query here, but the actual query is much more complicated and I cannot just change the index. I also do have scenarios where the text passed in full text is blank and in those scenarios I donot even want to join with containstable. In those cases changing my covered index to have the id field as the first field, will generate disaster.

Anyways, for now I chose the temp table solution since it is working for me. I am also limiting the result to a few thousand which helps with the potential performance issues of table variables when the number of records go too high.

thanks

Bob
A: 

Normally it works very fast:

select t1.*, t2.Rank
    from containstable(table1, field2, 'something') as t2
        join table1 as t1 ON t1.id = t2.Key AND t1.field1=90
    order by t2.Rank desc

There is a big difference where you put your search criteria: in JOIN or in WHERE.

Ghen