views:

314

answers:

8

The problem is that the query in question runs very slow when compared to the query run with one or two, rather than all three of its conditions.

Now the query.

Select Count(*)
From 
    SearchTable 
Where 
    [Date] >= '8/1/2009' 
    AND 
    [Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
    AND 
    FreeText([Description], 'keyword list here')  

The first condition is self explanatory. The second uses a UDF to get a list of Zip Codes within 150 miles of 30348. The third uses a full text index to search for the provided words.

With only this condition

[Date] >= '8/1/2009' 

The query returns 43884 (table size is just under 500k rows) in 3 seconds.

Using only this condition

[Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))

I get 27920, also returned in 3 seconds.

And with only the full text portion

FreeText([Description], 'keyword list here')

68404 is returned in 8 seconds.

When I use just the zip code and full text conditions I get 4919 in 4 seconds.
Just the date and full text conditions gets me 9481 in just shy of 14 seconds.
Using the date and Zip Code conditions only gives me 3238 in 14 seconds.
With all three conditions the query returns 723 in 2 minutes, 53 seconds. (wtfbbq)

+2  A: 

Because more conditions to check is more work for the database engine. Seems logical to me.

If you were to have one condition over a clustered index field, this particular check wouldn't slow down the operation that much. Have you considered rearranging indexes to match your query?

Developer Art
If it was just a simple matter of more conditions adding time I would think that combining the zip code and full text indexes would also take longer than one or both conditions taken separately, but it is twice as fast as the full text condition run alone.I understand it taking longer due to more conditions, but it seems odd that I can write a query that creates a temp table, runs each piece individually and reports on the items that were in all three and it runs in under 10 seconds. I guess I am asking too much.
Clark
Even if the DBMS took the naive approach of applying the conditions one at a time, why would it seem logical for the total time to jump to 173 seconds?
mbeckish
If you can get it to run in 10 seconds using the temp table approach, I suggest you use it.
HLGEM
NO! - SQL queries are just so much more complicated than either "More filters = more work"!!!
Kragen
"Because more conditions to check is more work for the database engine" -- this is not true at all. In fact what happens (simplisticly) is that the query optimizer examines all the conditions and the available indexes and tries to create the fastest access plan. There is no direct relationship between how many where conditions and speed.
onupdatecascade
A: 

Data transportation wise, you are correct in your thinking: less data, quicker completion time. However, usually that time is minimal, and most of the time is spent on the actual query processing.

Look at it this way: If you were in a car lot, would it be easier to pick out all cars that were red, or all cars that were red, model year 2006, black interior, and had rubber floor mats?

John
+12  A: 

The only way to know why is to check the execution plan. Try SET SHOWPLAN_TEXT ON.

mbeckish
This is definitely the best answer! The only way to know what is happening is to check the execution plan.
HLGEM
+1  A: 
  • String operation such as FreeText are expensive
  • The ZipCodesForRadius function can be expensive too depending of how it is coded and if the necessary indexes are present or not

If ordering the WHERE clauses do not speed things up, having a select around your select may do the trick (It sped things up on some occasions with DB2/400, not sure about how SqlServer optimizes):

Select Count(*)
From
(
    Select [Description]
    From 
        SearchTable 
    Where 
        [Date] >= '8/1/2009' 
        AND 
        [Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))

) as t1
Where FreeText([Description], 'keyword list here')  
Danny T.
A: 

I suspect the Date field is not indexed, and without an index to rely on to filter the resultset before applying the where clause on the non-sargable columns, it gives them all equal weight and does not perform the quick filters first before applying the other more expensive clauses.

When I am unable to tune the database using indexes, etc., I often find that re-writing the query similar to this is enough to direct the compiler to a more efficient query:

Select Count(*)
From (
    Select 1
    From SearchTable 
    Where [Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
) 
Where [Date] >= '8/1/2009' 
    AND FreeText([Description], 'keyword list here')  
RedFilter
+2  A: 

Try to add some indexes to your table. Specifically ones that cover the conditions in your where clause. Most likely it is now doing a table scan to pull the data back which can be very very slow.

Also you might want to use the Include Actual Execution Plan button in management studio to show how it's going about determining which records you get.

UPDATE

From one of your comments it sounds like this query is pulling from a temp table. In that case after creating the table apply indexes to it. Adding the indexes then running the queries will be faster than running a table scan on a 500k row temp table.

Chris Lively
+5  A: 

Get an execution plan

You need to look at execution plan in order to have any hope in understand the real reason for the variation in response times. In particular in this case there are several factors to consider:

  • It's possible that some of the queries returning more rows are faster because they are doing table scans - everyone has "table scans are slow" drilled into them, but depending on the data distribution it could well be faster to do a table scan than 50,000 row lookups. Its simply not possible to tell without an execution scan.
  • It's also possible that incorrect statistics are preventing SQL server from accurately predicting that number of rows that its expecting to return - if SQL server is expecting 20 rows but there are really 20,000 then in more complicated queries its likely to end up doing things in the wrong order resulting in a very slow query - again its just not possible to tell without an execution plan.
  • In particular the use of Freetext means that the full text search engine is being used, which may be causing SQL server additional problems in predicting the number of rows returned.

Really, get an execution plan.

Update:

Possible causes

In the absence of an execution plan I think that the most likely cause of the slow execution is poor estimates for the conditions on ZipCode and Description:

  • Its difficult to estimate the number of matches on the ZipCode condition as its result depends on a stored procedure.
  • Its difficult to estimate the number of matches on the FreeText condition as its based on results from the full-text query engine.

What I believe is happening is that SQL server is under-estimating the number of rows that will remain after filtering, and applying the queries in the wrong order. The result is that it ends up doing tens (possibly hundreds) of thousands of lookups, which is far far slower than just doing a table scan.

For a particularly complicated query I've seen SQL server perform ~3,000,000 lookups attempting to return a single row - the table didn't even have 3,000,000 rows!

Things to try - Put ZipCodeForRadius into a temp table.

If I'm right, then to help with the first one you could try putting the results of the ZipCodesForRadius stored procedure into a temporary table, I have to admit I don't have a good explanation as to why this will help, but I do have a few theories on why it could help:

  • Better statistics on the temporary table
  • It will have the side effect of causing the main SELECT statement to be recompiled every time you run the query (unless the range of ZIP codes is very small) - at the proc takes a few seconds anyway this will be a good thing if there is great variation in the matching zip codes. If not then there are ways of preventing the recompilation.

It certainly shouldn't do too much damage in any case.

Kragen
A: 

If you have one condition to count() then the query can scan the narrowest index that covers the count. Even if is a full scan, the number of pages read is much smaller than that of a the clustered index scan, that is probably much wider. When you have multiple conditions the candidate rows have to be joined and the query plan may abandon the non-clustered index scans (or range scans) and go for a full table scan.

In you case, what likely happens is:

  • [Date] >= '8/1/2009' is satisfied by an index that contains Date, most likely by an index ON Date, so its a fast range scan
  • [Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150)) same as Date. Even if you don't have index on Zip, you likely have one that contains Zip.
  • FreeText([Description], 'keyword list here') fulltext search for count, that goes on through internal FT indexes, fast.

  • All three conditions. Now it gets messy. If you have enough RAM the query can make a plan for FT search first, then HASH-JOIN then Zip scan then HASH-JOIN the Date. This would be fast, on the order of 3+3+8 seconds + change (for the hash operation). But if you don't have enough RAM or if the optimizer doesn't like to do a hash-join, it will have to do an FT search, then nested loop search of Zip then nested loop search of Code and it may hit the index tipping point in its decisions. So most likely you get a table scan. This is of course speculation on my part, but after all you posted just the T-SQL text and zero information about the structure of your clustered and non-clustered indexes.

In the end you have to remember that SQL is a not your C-like procedural language. When talking about performance in SQL is never about comparisons and boolean logic. It's always about data access and the amount of pages read. So even though each individual condition can be satisfied by a small, fast, index range scan of a narrow non-clustered index or FT index, the combination cannot (or in his case, the Query Optimizer did not figure out a way to).

Remus Rusanu