views:

523

answers:

4

Again MSDN does not really explain in plain English the exact difference, or the information for when to choose one over the other.

CHECKSUM

Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.

BINARY_CHECKSUM

Returns the binary checksum value computed over a row of a table or over a list of expressions. BINARY_CHECKSUM can be used to detect changes to a row of a table.

It does hint that binary checksum should be used to detect row changes, but not why.

A: 

Check out the following blog post that highlights the diferences.

http://decipherinfosys.wordpress.com/2007/05/18/checksum-functions-in-sql-server-2005/

John Sansom
Andrew Burns
A: 

A couple of things to watch out for when using these functions:

  1. You need to make sure that the column(s) or expression order is the same between the two checksums that are being compared else the value would be different and will lead to issues.
  2. We would not recommend using checksum(*) since the value that will get generated that way will be based on the column order of the table definition at run time which can easily change over a period of time. So, explicitly define the column listing.
  3. Be careful when you include the datetime data-type columns since the granularity is 1/300th of a second and even a small variation will result into a different checksum value. So, if you have to use a datetime data-type column, then make sure that you get the exact date + hour/min. i.e. the level of granularity that you want.

three checksum functions available :

* CHECKSUM: This was described above.
* CHECKSUM_AGG: This returns the checksum of the values in a group and Null values are ignored in this case. This also works with the new analytic function’s OVER clause in SQL Server 2005.
* BINARY_CHECKSUM: As the name states, this returns the binary checksum value computed over a row or a list of expressions. The difference between CHECKSUM and BINARY_CHECKSUM is in the value generated for the string data-types. An example of such a difference is the values generated for “DECIPHER” and “decipher” will be different in the case of a BINARY_CHECKSUM but will be the same for the CHECKSUM function (assuming that we have a case insensitive installation of the instance). Another difference is in the comparison of expressions. BINARY_CHECKSUM() returns the same value if the elements of two expressions have the same type and byte representation. So, “2Volvo Director 20″ and “3Volvo Director 30″ will yield the same value, however the CHECKSUM() function evaluates the type as well as compares the two strings and if they are equal, then only the same value is returned. Example;

STRING BINARY_CHECKSUM_USAGE CHECKSUM_USAGE

------------------- ---------------------- -----------

2Volvo Director 20 -1356512636 -341465450

3Volvo Director 30 -1356512636 -341453853

4Volvo Director 40 -1356512636 -341455363

Kuya
speak and write and read english is my problem ... so sorry if i wrong meaning
Kuya
@Kuya: If you are going to copy and paste someone elses work. At least do them the courtesy of including a link to the source. http://decipherinfosys.wordpress.com/2007/05/18/checksum-functions-in-sql-server-2005/
John Sansom
yes i copied at that link. :)
Kuya
A: 

Its easy to get collisions from CHECKSUM(). HASHBYTES() was added in SQL 2005 to enhance SQL Server's system hash functionality so I suggest you also look into this as an alternative.

Greg, it would be useful to know the exact differences so one could make an informed decision about the problem at hand. I have already solved my problem but had issues finding more in-depth information so I created this SO page for others with more complex problems.
Andrew Burns
A: 

I've found that checksum collisions (i.e. two different values returning the same checksum) are more common than most people seem to think. We have a table of currencies, using the ISO currency code as the PK. And in a table of less than 200 rows, there are three pairs of currency codes that return the same Binary_Checksum():

  • "ETB" and "EUR" (Ethiopian Birr and Euro) both return 16386.
  • "LTL" and "MDL" (Lithuanian Litas and Moldovan leu) both return 18700.
  • "TJS" and "UZS" (Somoni and Uzbekistan Som) both return 20723.

The same happens with ISO culture codes: "de" and "eu" (German and Basque) both return 1573.

Changing Binary_Checksum() to Checksum() fixes the problem in these cases...but in other cases it may not help. So my advice is to test thoroughly before relying too heavily on the uniqueness of these functions.

Pete