views:

133

answers:

5

I have a SQL 2008 R2 Database with about 2 million rows in one of the tables and am struggling with the performance of a specific query when using parameterized SQL.

In the table, there's a field containing a name in it:

[PatientsName] nvarchar NULL,

There's also a simple index on the field:


CREATE NONCLUSTERED INDEX [IX_Study_PatientsName] ON [dbo].[Study] 
(
    [PatientsName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [INDEXES]
GO

When I do this query in the management studio, it takes around 4 seconds to execute:


declare @StudyPatientsName nvarchar(64)
set @StudyPatientsName= '%Jones%'

SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName

But, when I execute this query:


SELECT COUNT(*) FROM Study WHERE Study.PatientsName like '%Jones%'

it takes a bit more than a half second to execute.

Looking at the execution plans, the query without parameterization does an Index Scan using the above mentioned index, which obviously is efficient. The parameterized query uses the index, but does a range seek on the index.

Part of the issue is having the leading wildcard. When I remove the leading wildcard, both queries return in a fraction of a second. Unfortunately, I do need to support leading wildcards.

We have a home grown ORM that does parameterized queries where the problem originated. These queries are done based on input from a user, so parameterized queries make sense to avoid things like a SQL injection attack. I'm wondering if there's a way to make the parameterized query function as well as the non-parameterized query?

I've done some research looking at different ways to give hints to the query optimizer, trying to force the optimizer to redo the query plan on each query, but haven't found anything yet to improve the performance. I tried this query:


SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName
OPTION ( OPTIMIZE FOR (@StudyPatientsName = '%Jones%'))

which was mentioned as a solution in this question, but it didn't make a difference.

Any help would be appreciated.

+2  A: 

I think your best chance of improving performance here is to look into using a full text index.

Joe Stefanelli
I would concur that this is a viable solution to the problem. In my particular instance, its not practical to use a full text index, but this would be another way to get around the quirk in SQL Server it seems I've discovered.
Steve Wranovsky
A: 

I'm having trouble finding the documentation to verify this, but IIRC, COUNT(*) does a full table scan in MS SQL (as opposed to using a cached value). If you run it against a column which cannot be null and/or has an index defined, I believe (again, still can't find docs to confirm, so I could be off base here) that will be faster.

What happens when you modify the query to something like:

SELECT COUNT(id) FROM Study WHERE Study.PatientsName Like @StudyPatientsName

or

SELECT COUNT(PatientsName) FROM Study 
WHERE Study.PatientsName 
LIKE @StudyPatientsName
AllenG
I tried a query select count(Guid) and select count(PatientsName) and in both cases, the queries were slow. I also tries just a select * from Study with the same where clause, and it also took roughly the same amount of time to execute.
Steve Wranovsky
+3  A: 

It seems like you want to force a scan. There is a FORCESEEK hint but I couldn't see any analogous FORCESCAN hint. This should do it though.

SELECT COUNT(*) 
FROM Study 
WHERE Study.PatientsName + '' like @StudyPatientsName

Although maybe you could try the following on your data and see how it works out .

SELECT COUNT(*) 
FROM Study 
WHERE Study.PatientsName  like @StudyPatientsName
option (recompile)
Martin Smith
He could probably force the scan with an index hint as well. `SELECT COUNT(*) FROM Study with (index(IX_Study_PatientsName)) WHERE Study.PatientsName like @StudyPatientsName`
Joe Stefanelli
@Joe - It is using the index but doing a range seek on it. I think the plan he is getting is like this `declare @StudyPatientsName nvarchar(64) set @StudyPatientsName= '%a%' SELECT COUNT(*) FROM master.dbo.spt_values WHERE type like @StudyPatientsName` http://img826.imageshack.us/img826/3977/executionplan.jpg
Martin Smith
Ah, never mind then. The index usage wasn't clear to me.
Joe Stefanelli
Adding the empty string to the field make (e.g., +'') does work, although it definitely feels like a hack. Adding "option (recompile)" does not improve the performance.I really thought when I initially looked into this that using "option (recompile)" would resolve the issue. there must be something going on with SQL that its not forcing a scan of the index in this case. In any case, I'll give it another day to see if there's other suggested answers, but it looks like this should help solve the issue for me.
Steve Wranovsky
@Steve - I got fairly strange results with `recompile` as well. It did seem to cause the plan to change for different inputs but I think it must use the statistics in some way and I couldn't really work out whether it was for good or bad.
Martin Smith
A: 

Maybe you can try this:

declare @StudyPatientsName nvarchar(64)
set @StudyPatientsName= 'Jones'

select count(*) 
from Study 
where Study.PatientsName like '%' + @StudyPatientsName + '%'

you can also add with(nolock) and it might help:

select count(*) 
from Study  with(nolock)
where Study.PatientsName like '%' + @StudyPatientsName + '%'
koderoid
No, don't ever use WITH(NOLOCK) as a generic solution to "there is a problem with performance but I don't know what it is" problem.
erikkallen
with(nolock) helps speed up the query, and it's not a bad thing as long as you're aware of you data. Whether it is fine to use with(nolock) or data is too critical and can't use with "with(nolock)". why would you even down voted my answer? although, it's a possible solution to his or other's issue.
koderoid
Becasue it is a bad thing many times. You are getting dirty data. This is a critical bug in many systems so it should not be a recommended answer as a general solution.
HLGEM
For completeness sake, I did try both of the above suggestions, and neither resolved the issue. Queries were still in the 4.5 second range on my data.
Steve Wranovsky
@HLGEM, with the way the question was asked by the person, it seems to me that he's capable of evaluating the answer and the use of function(s) if it'll be feasible for his use or not. Also, he is getting the count(*) not which is good enough to put with(nolock) since it's not going to do any hold to the object. That means updates/inserts/deletes are still good. BTW, the amount of data is 2 million rows plus he's using "Like".
koderoid
and the person doing the asking is not the only consumer of the answer.
HLGEM
@HLGEM, that is correct, that is why there's a check mark to be checked in case the person who've asked accepted it. My point is why it was even down voted when it is a possible solution and that means if others will see it as a possible solution, it'll be their responsibility to determine if that'll be useful to them or not. We provide possibilities.
koderoid
@Steve, if that's the case, it might be the issue of LIKE clause itself. As suggested by Joe, FULL Text Search is probably what you need. You might also want to check about http://lucene.apache.org/java/docs/index.html
koderoid
I downvoted it for the exact reason in my first comment: You are suggesting with(nolock) as a generic performance problem solver. It is, however, obvious to anyone who knows query plans that the issue is that the optimizer chooses an index seek + bookmark lookup when a clustered index scan would be better. Therefore, your answer doesn't help the OP, and it is outright dangerous if someone googles this question and blindly applies your "solution".
erikkallen
Now I understand. Thank you.
koderoid
A: 

If all else fails you could try

SELECT COUNT(*) FROM Study WITH(INDEX(0)) WHERE Study.PatientsName like @StudyPatientsName

Perhaps you could wrap it in an IF

IF substring(@StudyPatientsName, 1, 1) = '%'
    SELECT COUNT(*) FROM Study WITH(INDEX(0)) WHERE Study.PatientsName like @StudyPatientsName
ELSE
    SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName

Edit: As martin pointed out, for this specific query this is probably not the best way to do it since an index scan of the existing index is likely faster. It might be applicable in similar situations, though.

erikkallen
I agree that making a special case for leading wildcards would be a good idea. I don't think the index hint would make any difference though. The issue is that it uses the correct index but in an inefficient way that seems to add to the amount of I/O. To reproduce try `set statistics io on declare @StudyPatientsName nvarchar(64) set @StudyPatientsName= '%%' SELECT COUNT(*) FROM master.dbo.spt_values WHERE type like @StudyPatientsName` then compare plan and I/0 with `WHERE type+'' like @StudyPatientsName`
Martin Smith
According to the docs, INDEX(0) will force a table/clustered index scan (not seek). I doubt you can do better than this. The problem is that there is no way to use an index to do a LIKE with a leading wildcard.
erikkallen
My answer does better than this! The nonclustered index `IX_Study_PatientsName` should completely cover this query so that's the one that needs to be scanned. This will presumably be a lot narrower as it only contains the name column and a scan of it will involve a lot less I/O.
Martin Smith
@Martin: You are right. I'll let the answer stand with this remark, though.
erikkallen