ansaurus

Question

SQL Server 2005, wide indexes, computed columns, and sargable queries

Answer 1

+1 A:

Did you try

WHERE [Name_Indexable]='1,2,3-Propanetriol'

After all that is where your index is created on

SQLMenace 2010-06-02 19:34:17

+1 makes sense to me.

KM 2010-06-02 19:41:15

Yes, sorry. I fixed my example to make it clearer as to what I am trying to achieve.

luksan 2010-06-02 19:41:36

Answer 2

+2 A:

Your index is on name_indexable, not on name. Since name_indexable is generated from a function involving name instead of directly on the column name, the optimizer won't automatically use the index when your where clause includes a reference to name. You must search on name_indexable in order to use the index. Since you have a middle tier, your best bet is probably to provide a function that searches on name_indexable if the given name is <= 200 characters, and otherwise searches on both.

Donnie 2010-06-02 19:42:04

Yeah, this is what I am trying to avoid though.

luksan 2010-06-02 19:46:35

You could try using a `WITH (INDEX xxx)` query hint. I'm not sure if this will actually work or not with the way you're computing the column, but it's worth trying.

Donnie 2010-06-02 19:57:20

@Donnie interesting idea, still results in the same query though. Actually the index does get used in the original query, but it is performing a scan rather than a seek.

luksan 2010-06-02 20:31:50

Answer 3

+2 A:

Make the Name_Index column a persisted computed column and the primary key and enforce uniqueness by appending the ChemicalId instead of relying on triggers.

CREATE TABLE dbo.[Chemical]
    ([ChemicalId] int NOT NULL IDENTITY(1,1), 
    [Name] Nvarchar(max) NOT NULL, 
    [Description] Nvarchar(max) NOT NULL,
    [Name_Index] AS (CONVERT(VARCHAR(20), LEFT([Name], 20)) + CONVERT(VARCHAR(20), [ChemicalId])) PERSISTED PRIMARY KEY);

Registered User 2010-06-02 20:00:13

If making it persisted allows the index to be used, good. If not, maybe you should bite the bullet and make it a "full-time" column.

Philip Kelley 2010-06-02 20:03:18

That would guarantee that any value inserted into Name (even duplicates) could never violate the unique constraint, since the ID is always going to be unique.

luksan 2010-06-02 20:08:37

Goal is not really to make it the primary key, we like our identity keys. I checked the query plan, your solution still does not allow us to cause an index seek without explicitly specifying [Name_Index] in the where clause.

luksan 2010-06-02 20:17:11

Answer 4

A:

Fix your data model. You have a comma-delimited list in the name column, to me that means you would better be able to query if you had a related table. Your name appears to be a list of ingredients not a name.

If this really truly is a real name, then Registered User has a good plan.

HLGEM 2010-06-02 20:05:11

It's not a comma-delimited list, it's a chemical name. Really. I can't make this stuff up.

luksan 2010-06-02 20:14:17

Answer 5

+1 A:

IMHO, yes, I think this is a bad approach. If you knew that the first 20 characters will be unique, then it should be a first class column with a unique constraint. If you want to have better searching on the Name column, then using full-text search is the right way to go. If you want to ensure that the varchar(max) column is unique, then create a computed column that generates a hash off the value and place a unique constraint off that.

Alter Table Add NameHash Hashbytes('SHA1', [Name])

ADDITION

Given our discussion, if your searches are always going to be on a exact match, then you could hash your search parameter and compare it to NameHash above. However, the catch is that the match must be an exact match (i.e. case-sensitive).

I still content that the FTS will be your best bet. Even though there is overhead in breaking up your text into words, FTS is the tool best designed for doing searches against large amounts of text. The longer your search criteria, the more exact it will be, the faster the search.

Thomas 2010-06-02 20:23:36

Yep, we know all about HASHBYTES. Still forces us to write a funky query to exploit the index though.

luksan 2010-06-02 20:29:27

@luksan - How so? You would never use NameHash in your search. If you are searching on name, you would use full-text search and that will definitely use the (full-text) index.

Thomas 2010-06-02 20:31:09

@Thomas If we enabled full-text indexing, yes. Problem is I don't know the overhead of doing that, whether it is as optimized for a scenario like this (matching the entire string as opposed to individual words), whether it is as fast as a regular index, etc. That's why I posted the question.

luksan 2010-06-02 20:34:04

@luksan - Given the data you have show us and how long it is, the benefit would be enormous. FTS breaks up the data into words where "words" are determined by word breaks which are culture specific. In your case, you have obvious word breaks (the commas) in your text so FTS would work very well. In fact, I think the benefit would be enormous. The overhead is minimal and can be adjusted to scale well.

Thomas 2010-06-02 20:38:47

@luksan - Keep in mind that the NameHash computed column I suggested was only to provide a means to ensure uniqueness on a varchar(max) column. It is entirely orthogonal to the need for an index to speed searches on names and for that Full Text Search would be the right solution.

Thomas 2010-06-02 20:40:52

@Thomas query plan indicates full text index is only accessed if we use CONTAINS() function rather than equality operator. But we don't want to look for a substring, we want to match the entire string. Also, seems like the full-text engine would be wasting a lot of CPU time and storage space in attempting to break these silly chemical names up into "words."

luksan 2010-06-02 20:41:58

@luksan - I should also mention that the dashes will likely be treated as word breaks so a search on say "dihydroxy" would be very fast with FTS.

Thomas 2010-06-02 20:42:28

@luksan - If your searches are always on a unique match, then search on the hashed parameter. I.e., call Hashbytes on your input search criteria and search against NameHash.

Thomas 2010-06-02 20:44:29

@Thomas tried to vote your answer up again, instead it voted it down and won't let me change my vote, ugh.

luksan 2010-06-02 21:03:34

@luksan - Try it now (I edited my answer)

Thomas 2010-06-02 22:43:42

@Thomas That worked.

luksan 2010-06-02 23:02:43

We didn't go with the FTS but on further reflection HASHBYTES() seems less kludgy than LEFT(), so thanks!

luksan 2010-06-03 14:45:41

Answer 6

+1 A:

I find your solution from the question (the last query) very good, but I personally prefer to say SQL more exactly what and how I want to do. So if you works with Microsoft SQL Server or with some other SQL Server which supports CTE (common table expression) you can rewrite your query like following:

DECLARE @data nvarchar(max);

SET @data = '[1,1''-Bicyclohexyl]-2-carboxylic acid, 4'',5-dihydroxy-2'',3-dimethyl-5'',6-bis[(1-oxo-2-propen-1-yl)oxy]-, methyl ester';

WITH ReduceData ([ChemicalId], [Name], [Description]) AS (
    SELECT [ChemicalId], [Name], [Description] 
    FROM [dbo].[Chemical]
    WHERE [Name_Indexable]=LEFT(@data,20)
)
SELECT [ChemicalId], [Name], [Description] 
FROM ReduceData
WHERE [Name]=@data

(In the real implementation you probably don't need to define @data. Instead you can just use a parameterized query.). What I suggest is just to say SQL more explicit what you want. All CTE Queries can be very good optimized.

It can be that your original query will be compiled to the absolutely the same execution plan as my CTE version. You can look at the both plans and compare there. In your project you have probably much more complex queries as from your question. If you'll use more CTE, your SQL code will be easy to read, it can be very good optimized and you can be sure that SQL Server do exactly what you want.

UPDATED: By the way the line

ALTER TABLE [Chemical]
    ADD [Name_Indexable] AS LEFT([Name], 20)

should be changed to

ALTER TABLE [Chemical]
    ADD [Name_Indexable] AS CAST(LEFT([Name], 20) AS varchar(20)) PERSISTED

to make a [Name_Indexable] column of the type varchar(20) on the Microsoft SQL Server 2008 and mark it PERSISTED to stores the computed values in the table, and updates them when any other columns on which the computed column depends are updated

Oleg 2010-06-02 23:07:02

I don't think you really need the CTE (just AND the two filter criteria), but the example of having a single parameter used twice is great. I totally support this and furthermore, you can wrap this in a parameterized inline table-valued UDF so that the index usage is enforced transparently. Obviously for set-based join operations, this is a bit more problematic, but common joins could be represented with views with the appropriate LEFT() usage.

Cade Roux 2010-06-03 02:31:46

@Oleg Why does persisting the column matter? The column will be "persisted" in the index, which is the only place where it will really be needed (never will be used in a SELECT). Won't persisting it in the table just waste space? Won't the index nodes automatically be updated if the computed column is not persisted? What is the advantage of persisting?

luksan 2010-06-03 12:07:08

ansaurus

tags:

views:

answers:

SQL Server 2005, wide indexes, computed columns, and sargable queries

related questions