views:

53

answers:

2

In the my application, i save urls content into specific table of database. to have minimum duplication, i want to compute checksum for each content. so what is best sqlserver data-type for saving checksum's? and fastest way to computing checksum's for contents(html) of urls?

+2  A: 

SHA1 could be used to calculate the checksum. The result is a byte array which could be stored either as hex string or blob field in SQL but I think for practical reasons a string would be more convenient.

Darin Dimitrov
+1  A: 

you can use a built in function in sql server to compute any of these( MD2, MD4, MD5, SHA, or SHA1)

examples

SELECT HashBytes('MD5', 'http://www.cnn.com')

that returns the varbinary datatype 0xC50252F4F24784B5D368926DF781EDE9

SELECT CONVERT(VARCHAR(32),HashBytes('MD5', 'http://www.cnn.com'),2)

that returns a varchar C50252F4F24784B5D368926DF781EDE9

Now all you have to do is picking if you want varchar or varbinary and use that for your column

See Generating a MD2, MD4, MD5, SHA, or SHA1 hash by using HashBytes

SQLMenace
OK, this is a good approach. but there is limitation (max length of input is 8000 bytes)
Sadegh