views:

348

answers:

12

I'm creating a stored procedure for searching some data in my database according to some criteria input by the user.

My sql code looks like this:

Create Procedure mySearchProc
(
@IDCriteria bigint=null,
...
@MaxDateCriteria datetime=null
)
as
select Col1,...,Coln from MyTable 
where (@IDCriteria is null or ID=@IDCriteria)
...
and (@MaxDateCriteria is null or Date<@MaxDateCriteria)

Edit : I've around 20 possible parameters, and each combination of n non-null parameters can happen.

Is it ok performance-wise to write this kind of code? (I'm using MS SQL Server 2008)

Would generating SQL code containing only the needed where clauses be notably faster?

+3  A: 

OR clauses are notorious for causing performance issues mainly because they require table scans. If you can write the query without ORs you'll be better off.

Otávio Décio
You mean that SQL Server won't find out that the whole or clause is always true, or at least won't optimize accordingly?
Brann
Brann, no because when sql sever builds the execution plan it doesn't take the values into account. It has to assume the worst.
Logicalmind
+2  A: 
where (@IDCriteria is null or ID=@IDCriteria)
  and (@MaxDateCriteria is null or Date<@MaxDateCriteria)

If you write this criteria, then SQL server will not know whether it is better to use the index for IDs or the index for Dates.

For proper optimization, it is far better to write separate queries for each case and use IF to guide you to the correct one.

IF @IDCriteria is not null and @MaxDateCriteria is not null

  --query
  WHERE ID = @IDCriteria and Date < @MaxDateCriteria

ELSE IF @IDCriteria is not null

  --query
  WHERE ID = @IDCriteria

ELSE IF @MaxDateCriteria is not null

  --query
  WHERE Date < @MaxDateCriteria

ELSE

  --query
  WHERE 1 = 1

If you expect to need different plans out of the optimizer, you need to write different queries to get them!!

Would generating SQL code containing only the needed where clauses be notably faster?

Yes - if you expect the optimizer to choose between different plans.


Edit:

DECLARE @CustomerNumber int, @CustomerName varchar(30)

SET @CustomerNumber = 123
SET @CustomerName = '123'

SELECT * FROM Customers
WHERE (CustomerNumber = @CustomerNumber OR @CustomerNumber is null)
  AND (CustomerName = @CustomerName OR @CustomerName is null)

CustomerName and CustomerNumber are indexed. Optimizer says : "Clustered Index Scan with parallelization". You can't write a worse single table query.


Edit : I've around 20 possible parameters, and each combination of n non-null parameters can happen.

We had a similar "search" functionality in our database. When we looked at the actual queries issued, 99.9% of them used an AccountIdentifier. In your case, I suspect either one column is -always supplied- or one of two columns are always supplied. This would lead to 2 or 3 cases respectively.

It's not important to remove OR's from the whole structure. It is important to remove OR's from the column/s that you expect the optimizer to use to access the indexes.

David B
But the "optimizer choosing between different plans" would mean that you actually throw an arbitrary number of "new" queries at the server, forcing it to create a new query plan every time. Since the plan cache won't be there forever, wouldn't this rather impair performance?
Tomalak
I think there is a risk that if this is in a single SProc that it will create a cached query plan based on the first usage - which might not be representative. You could avoid this by having each IF statement execute a different, specific, sub-Procedure which will be separately cached
Kristen
@David Unfortunately I've too many parameters to take the approach you're suggesting (I edited my question accordingly)
Brann
@David B: I get a straight-forward Index Seek out of a query equivalent to your last edit. I don't think you can generalize it to "You can't write a worse single table query."
Tomalak
@David B: However, I do get a "Clustered Index Scan w/parallelism" when I change to ISNULL or COALESCE.
Tomalak
@Tomalak: do you have one index on (CustomerNumber, CustomerName) or two indexes?
Logicalmind
@Logicalmind: The table has two separate indexes. The query is turned into an Index Seek on one of them, and a Bookmark Lookup. But only as long as I maintain the OR. Anything else (CASE WHEN, ISNULL, COALESCE) is resulting in a Clustered Index Scan w/parallelism.
Tomalak
To scale that, he'd have to have indexes on every column that could appear in the where clause. I think David's point was that this is unlikely. He would likely have multiple columns in fewer indexes. When the optimizer doesn't know if the leading columns are included it must do a scan.
Logicalmind
To David's last edit. He is correct, the leading columns on the indexes should be the most selective. And the leading columns should not be optional. This solves your problem, if you can do meet those requirements.
Logicalmind
A: 

Regarding "Would generating SQL code containing only the needed where clauses be notably faster?"

I don't think so, because this way you effectively remove the positive effects of query plan caching.

Tomalak
Query plan caching is irrelevant if the optimizer chooses the wrong (or no) index.
David B
I thought the query plan would be a "compiled" form of the query that the server would keep for a while in case the same query comes by *again* after a while. Can you explain why it would be irrelevant when no index was involved? Thanks :)
Tomalak
If you create Dynamic SQL and use sp_ExecuteSQL so that the query is parametrised, then yes it will be faster. Each possible combination of the parameters will need to be cached (so in practice the "popular" variations will be cached) But you will need SELECT permission and SQL Injection needs care
Kristen
But you would get a delay every time you come up with a new combination of parameters for the first time (the OP indicates there are 20 possible parameters), which may result in a more severe slowdown than having one query that needs to be parsed/cached only once, me thinks.
Tomalak
Yes, that's true. But the "popular" queries would be cached. If the site is busy then it is likely that all/most combinations will be cached. If the site is not busy then there shouldn't be a performance issue whichever method is chosen!
Kristen
That popular ones will benefit is true as well, tho this depends on the usage pattern. Anyway: Having 20 parameters will result in 2^20 possible query plans. Rethorical question: Will the plan cache hold them all, or will it throw away the older ones bit by bit?
Tomalak
If there is space in the cache it will hold them all :) If not then the "unpopular ones" will be dropped
Kristen
as usual, it depends ;-)http://blogs.msdn.com/sqlprogrammability/archive/2007/01/22/3-0-changes-in-caching-behavior-between-sql-server-2000-sql-server-2005-rtm-and-sql-server-2005-sp2.aspx
Logicalmind
"Can you explain"... If the query is a table scan against 40M records instead of an index scan for 20 records, it will perform poorly whether or not it is in the cache.
David B
@David B: Absolutely correct. I simply got you wrong (thinking that the query plan itself would become irrelevant - it won't, but it also won't help the overall performance very much if the plan is poor). This leaves the question why the index scan you predicted doesn't occur in my test.
Tomalak
"it will perform poorly whether or not it is in the cache" Yup. But the time to make a new query plan is saved if there is one already in the cache (for a complex query time-to-make-query-plan can be significant) - which may represent a saving.
Kristen
@Kristen - *shrug*. Have fun with that IO for 40 million records. @Tomalak - I'm not feeling snarky enough to say UserError, plus I can't think of any way to mangle the example to produce the result you saw. I guess it's an XFile.
David B
"Have fun with that IO for 40 million records" Indeed :( But I would optimise the popular queries, and for the rest the fact that the DB Server CAN find the answer REASONABLE quickly is usually "acceptable"
Kristen
You don't want query plan caching in this case
erikkallen
A: 

You could perform selective queries, in order of the most common / efficient (indexed etc), parameters, and add PK(s) to a temporary table

That would create a (hopefully small!) subset of data

Then join that Temporary Table with the main table, using a full WHERE clause with

SELECT ...
FROM @TempTable AS T
    JOIN dbo.MyTable AS M
        ON M.ID = T.ID
WHERE (@IDCriteria IS NULL OR M.ID=@IDCriteria)
...
AND (@MaxDateCriteria IS NULL OR M.Date<@MaxDateCriteria)

style to refine the (small) subset.

Kristen
A: 

What if constructs like these were replaced:

WHERE (@IDCriteria IS NULL OR @IDCriteria=ID)
AND (@MaxDateCriteria IS NULL OR Date<@MaxDateCriteria)
AND ...

with ones like these:

WHERE ID = ISNULL(@IDCriteria, ID)
AND Date < ISNULL(@MaxDateCriteria, DATEADD(millisecond, 1, Date))
AND ...

or is this just coating the same unoptimizable query in syntactic sugar?

David M. Miller
A: 

Choosing the right index is hard for the optimizer. IMO, this is one of few cases where dynamic SQL is the best option.

erikkallen
+1  A: 

So, to boil down the above comments:

Create a separate sub-procedure for each of the most popular variations of specific combinations of parameters, and within a dispatcher procedure call the appropriate one from an IF ELSE structure, the penultimate ELSE clause of which builds a query dynamically to cover the remaining cases.

Perhaps only one or two cases may be specifically coded at first, but as time goes by and particular combinations of parameters are identified as being statistically significant, implementation procedures may be written and the master IF ELSE construct extended to identify those cases and call the appropriate sub-procedure.

David M. Miller
A: 

this is one of the cases i use code building or a sproc for each searchoption. since your search is so complex i'd go with code building. you can do this either in code or with dynamic sql. just be careful of SQL Injection.

Mladen Prajdic
A: 

I suggest one step further than some of the other suggestions - think about degeneralizing at a much higher abstraction level, preferably the UI structure. Usually this seems to happen when the problem is being pondered in data mode rather than user domain mode.

In practice, I've found that almost every such query has one or more non-null, fairly selective columns that would be reasonably optimizable, if one (or more) were specified. Furthermore, these are usually reasonable assumptions that users can understand.

Example: Find Orders by Customer; or Find Orders by Date Range; or Find Orders By Salesperson.

If this pattern applies, then you can decompose your hypergeneralized query into more purposeful subqueries that also make sense to users, and you can reasonably prompt for required values (or ranges), and not worry too much about crafting efficient expressions for subsidiary columns.

You may still end up with an "All Others" category. But at least then if you provide what is essentially an open-ended Query By Example form, then users will have some idea what they're getting into. Doing what you describe really puts you in the role of trying to out-think the query optimizer, which is folly IMHO.

le dorfier
What about davidm.miller's answer (ie a few dedicated requests to handle most common cases, and then a huge request to handle all the remaining cases?)
Brann
That's what I meant by "one step further" - i.e. think about it from the user experience and understanding of the domain context, as well as an efficient partitioning of the aggregated query. (I upvoted his sugesstion, BTW.)
le dorfier
A: 

I'm currently working with SQL 2005, so I don't know if the 2008 optimizer acts differently. That being said, I've found that you need to do a couple of things...

  1. Make sure that you are using WITH (RECOMPILE) for your query

  2. Use CASE statements to cause short-circuiting of the logic. At least in 2005 this is NOT done with OR statements. For example:

.

SELECT
     ...
FROM
     ...
WHERE
     (1 =
          CASE
               WHEN @my_column IS NULL THEN 1
               WHEN my_column = @my_column THEN 1
               ELSE 0
          END
     )

The CASE statement will cause the SQL Server optimizer to recognize that it doesn't need to continue past the first WHEN. In this example it's not a big deal, but in my search procs a non-null parameter often meant searching in another table through a subquery for existence of a matching row, which got costly. Once I made this change the search procs started running much faster.

Tom H.
"Make sure that you are using WITH (RECOMPILE) for your query" then your query plan will never be cached, surely? And if your query is complicated (20 OR tests were described) making the query plan could take longer than retrieving the actual data!
Kristen
You don't WANT the query plan to be cached. That's the whole point. If it were cached then you it would use whichever plan was created, based on the parameters passed the first time the SP was run. You then can't short-circuit the logic. It's unlikely the plan gen will take longer in this case.
Tom H.
A: 

My suggestion is to build the sql string. You will gain maximum performance from index and reuse execution plan.

DECLARE @sql nvarchar(4000);
SET @sql = N''

IF @param1 IS NOT NULL
    SET @sql = CASE WHEN @sql = N'' THEN N'' ELSE N' AND ' END + N'param1 = @param1';
IF @param2 IS NOT NULL
    SET @sql = CASE WHEN @sql = N'' THEN N'' ELSE N' AND ' END + N'param2 = @param2';
...
IF @paramN IS NOT NULL
    SET @sql = CASE WHEN @sql = N'' THEN N'' ELSE N' AND ' END + N'paramN = @paramN';

IF @sql <> N''
    SET @sql = N' WHERE ' + @sql;
SET @sql = N'SELECT ... FROM myTable' + @sql;

EXEC sp_executesql @sql, N'@param1 type, @param2 type, ..., @paramN type', @param1, @param2, ..., @paramN;
chaowman
I agree. Each variation of the query will cause a separate query plan to be cached, and popular ones will be retained in cache. However you need SELECT permission on the table(s) rather than just EXEC on the SProc. @SQL length > 4000 chars is a bug which may be hidden a long time!
Kristen
Why do I need select permission on the tables (I mean, if I put all this construction logic in a stored proc, I only need exec permisson on this stored proc, and nothing else ; or don't I? is doing an sp_executesql 'select * from A' in a stored proc different than directly 'select * from A' ?)
Brann
A: 

Each time the procedure is called, passing different parameters, there is a different optimal execution plan for getting the data. The problem being, that SQL has cached an execution plan for your procedure and will use a sub-optimal (read terrible) execution plan.

I would recommend:

  1. Create specific SPs for frequently run execution paths (i.e. passed parameter sets) optimised for each scenario.
  2. Keep you main generic SP for edge cases (presuming they are rarely run) but use the WITH RECOMPILE clause to cause a new execution plan to be created each time the procedure is run.

We use OR clauses checking against NULLs for optional parameters to great affect. It works very well without the RECOMPILE option so long as the execution path is not drastically altered by passing different parameters.

badbod99