views:

28

answers:

3

I'm not sure how to better phrase this question so it's possible I missed a previously asked question. Feel free to close this and point me to the correct one if it exists.

I have a table with two important columns (that is to say it has much more but only two is pertinent to this question). First column is a GUID (an id) and the second is a nvarchar (storing an URL). The combination of the ID and the URL has to be unique (so a same guid can be repeated but each row has a different URL and vice-versa but there cannot be more than one row of the same guid and URL).

Currently, before every INSERT, I do a SELECT to see if there exists a row with the same id and URL. However it looks like lookups on the nvarchar is slow. Therefore I think I will update the table to store an extra column which is filled in with the hash (SHA1) of the URL upon insertion. Now we only do a lookup on the smaller hash (varbinary?) which I assume will be significantly faster than before.

Is there a way to get SQL Server 2008 to automatically store the hash and do a lookup against that hash value instead of the actual text? I'm assuming that the indecies are b-trees, so what I'm asking for is for SQL Server to create the b-tree with the hash values of the text in the nvarchar field and when a select is run, it should calculate the hash and do a lookup in the tree with the hash value. Is this possible?

A: 

could you just put a unique constraint on the table for those two columns and perform the insert inside of a try / catch block?

It would save you from the extra work of calculating the hash, and the extra space of storing it

Mike Forman
A: 

You can have a trigger that calculates the hash on insert and update and puts it in if required.

In terms of stopping the insert just add a unique index on them

Preet Sangha
+2  A: 

If you do lookups on your (id, url) fields - do you have an index on those two columns?? If not - add one and see if that speeds up your lookups enough.

If not: yes, you can definitely get this functionality automagically - the magic word is: computed column.

In SQL Server, you can have columns that compute their values automatically, based on a formula you provide. This can be either just a simple arithmetic formula, or you can call a stored function to compute the value.

In order to make this fast for your checks, you would have to make sure you can make that computed column persisted - then you can index it, too. This excludes larger scale computations - the formula has to be clear, concise, and deterministic.

So, do this:

ALTER TABLE dbo.YourTable
  ADD HashValue AS CAST(HASHBYTES('SHA1', CAST(ID AS VARCHAR(36)) + Url) AS VARBINARY(20)) PERSISTED

Now your table has a new HashValue column (call it whatever you like), and you can select that value and inspect it.

Next put an index on that new column

CREATE NONCLUSTERED INDEX IX_Hash_YourTable
  ON dbo.YourTable(HashValue)

Now your lookup should be flying!

marc_s
Thanks for your help! That's very useful. I have one more question. I created the Hash field on just the URL itself and created a index on id,urlhash and ran a query that looked like this:select * from table where id='..' and urlhash=HASHBYTES('SHA1','url.com')This gave me no results. I had to search on id='...' and url='url.com' to get the results. Does this end up using the index I created on the hash? I tried looking at the execution plan and it says it did a index scan on the PK and a "Compute Scalar". What's that mean?
royrules22
@royrules22: first of all: did you make the computed field PERSISTED?? If not, then the hash will be computed each time to access the field. Second: if you have an index (id,urlhash), then that index **cannot** be used if you search for just "urlhash".
marc_s
It is PERSISTED and I always search for ID,URLHash combo. Anyway I figured it out. Turns out when testing I was testng HASHBYTES('SHA1','url.com') and 'url.com' is a varchar which produces a different hash than nvarchar. To simulate an nvarchar literal I had to do N'url.com'. Whoops.
royrules22