views:

127

answers:

5

I need to create a hash key on my tables for uniqueness and someone mentioned to me about md5. But I have read about checksum and binary sum; would this not serve the same purpose? To ensure no duplicates in a specific field. Now I managed to implement this and I see the hask keys in my tables. Do I need to alter index keys originally created since I created a new index key with these hash keys? Also do I need to change the keys? How do I change my queries for example SELECT statements? I guess I am still unsure how hash keys really help in queries other than uniqueness?

+1  A: 

If your goal is to ensure no duplicates in a specific field, why not just apply a unique index to that field and let the database engine do what it was meant to do?

Quick Joe Smith
I did and this is easy to use; but my team leader wants the hash key used since the tables will contain hundreds of thousands of records indexed by the string field.
vbNewbie
@vbNewbie: your team lead is a muppet. @Quick Joe Smith: perfect
gbn
thank you gbn, I agree but since I am the new guy...
vbNewbie
@vbNewbie: Well, show your team lead our answers and comments...
gbn
+1  A: 
Joel Coehoorn
Thank you. That helps. I did implement a md5 function through a sql server user defined function and it works. So from your post can I assume that the hash keys have no influence on primary key selections? I initially had unique auto incremented keys as primary keys for my table and the only thing I have changed after implementing the hash function is to create a index on the same field used in the hash function. Also if were to select records from 1 or more tables using a basic query to match the primary and foreign keys, then I would not use the hash key explicitly right?
vbNewbie
only in the event of trying to match a query based on the string field used in the hash function
vbNewbie
+1  A: 

It makes no sense to write a unique function to replace SQL Server unique constraints/indexes.

How are you going to ensure the hash is unique? With a constraint?

If you index it (which may not be allowed because of determinism), then the optimiser will treat it as non-unique. As well as killing performance.

And you only have a few 100,000 rows. Peanuts.

Given time I could come up with more arguments, but I'll summarise: Don't do it

gbn
So what is the purpose of md5 and other hash functions besides for cryptography - I understand the basic use of SQL server indexes etc. it has served me well in other applications but I dont have a choice here. Or did he mean something else; perhaps implementing some hash function in my vb script to avoid duplicates but that still avoids sql indexes and constraints, right? Oh and it actually is millions of records
vbNewbie
I've no idea why you have this requirement. But if it's for managing unique values in a database table, it's the wrong solution. I have a 10 (or so) column unique index on a 200 million row table...
gbn
A: 

Cryptographically save Hash functions are one way functions and they consume more resources (CPU cycles) that functions that are not cryptographically secure. If you just need function as hash key you do not need such property. All you need is low probability for collisions what is related whit uniformity. Try whit CRC or if you have strings or modulo for numbers.

http://en.wikipedia.org/wiki/Hash_function

ralu
A: 

why don't you use a GUID with a default of NEWSEQUENTIALID() ..don't use NEWID() since it is horrible for clustering, see here: Best Practice: Do not cluster on UniqueIdentifier when you use NewId

make this column the primary key and you are pretty much done

SQLMenace