views:

287

answers:

3

I want to compute a checksum of all of the values of a column in aggregate.

In other words, I want to do some equivalent of

md5(group_concat(some_column))

The problem with this approach is:

  1. It's inefficient. It has to concat all of the values of the column as a string in some temporary storage before passing it to the md5 function
  2. group_concat has a max length of 1024, after which everything else will be truncated.

(In case you're wondering, you can ensure that the concat of the values is in a consistent order, however, as believe it or not group_concat() accepts an order by clause within it, e.g. group_concat(some_column order by some_column))

MySQL offers the nonstandard bitwise aggregate functions BIT_AND(), BIT_OR() and BIT_XOR() which I presume would be useful for this problem. The column is numeric in this case but I would be interested to know if there was a way to do it with string columns.

For this particular application, the checksum does not have to be cryptologically safe.

+1  A: 

If the column is numeric, you could do this:

SELECT BIT_XOR(mycolumn) + SUM(mycolumn)

Of course this is easy to defeat, but it will include all the bits in the column.

Jason Cohen
+1  A: 
SELECT  crc
FROM
(
  SELECT @r := MD5(CONCAT(some_column, @r)) AS crc,
         @c := @c + 1 AS cnt
  FROM
    (
    SELECT @r := '', @c := 0
    ) rc,
    (
    SELECT some_column
    FROM mytable
    WHERE condition = TRUE
    ORDER BY
      other_column
    ) k
) ci
WHERE cnt = @c
Quassnoi
+2  A: 

It seems like you might as well use crc32 instead of md5 if you don't care about cryptographic strength. I think this:

select sum(crc32(some_column)) from some_table;

would work on strings. It might be inefficient as perhaps MySQL would create a temporary table (especially if you added an order by).

Jacob Gabrielson