views:

37

answers:

3

Hi

I need generate a single hash over some data in a table

CREATE TABLE Table1
(
       F1             INT          UNSIGNED NOT NULL AUTO_INCREMENT,
       F2              INT          default     NULL,
       F3               Varchar(50)  default     NULL,
      ..
       FN              INT          default     NULL,
       PRIMARY KEY (F1)
);

i.e. F1, F3,FN where F2=10

SELECT md5(CONCAT_WS('#',F1,F3,FN)) FROM Tabe1 WHERE F2=10

Gives a Hash for each row in the table.

QUESTIONS

1) How do get a single hash over the whole table?

2) What is the fasts hashing algorithm to use MD5, SHA1, SHA or any other?

EDIT:

Mysql 4.1 is been used - and it does NOT have Trigger Support

+1  A: 

1)

SELECT MD5( GROUP_CONCAT( CONCAT_WS('#',F1,F3,FN) SEPARATOR '##' ) ) FROM Table1

2) Speed doesn't really matters as a function has to run only once and all hash functions are fast enough

Naktibalda
+1  A: 

I would use a MySQL Trigger to detect changes on insert, delete, update, etc.

labratmatt
+1  A: 

As for speed, you should try. It depends on the way the functions are implemented.

Chances are, however, that you will see very little speed differences. The hash functions you cite are all faster than what an average disk can spew out, so the question is not really "what hash function will make the code runs fastest ?" but "what hash function will make the CPU most idle while it waits for the data from the disk ?".

On my Intel Core2 Q6600, clocked at 2.4 GHz (64-bit mode), with my own C implementation of hash functions, I get the following hashing speeds:

  • MD5: 411 MB/s
  • SHA-1: 336 MB/s
  • SHA-256: 145 MB/s
  • SHA-512: 185 MB/s

That's using a single core only. My hard disks top at about 100 MB/s, so one can say that even with SHA-256, the hashing process will use no more than 17% of the machine CPU power. Of course, nothing guarantees that the implementation used by MySQL is that fast, which is why you should try. Also, in 32-bit mode, SHA-512 performance decreases quite a bit.

Cryptographically speaking, (grave) weaknesses have been found in MD5 and SHA-1, so if you work in a security-relevant setting (i.e. you want to detect changes even if there is someone who can choose some of the changes and would prefer that you do not detect said changes), you should stick to SHA-256 or SHA-512, which, as far as we know, are robust enough. MD5 and SHA-1 are still fine in non-security situations, though.

Thomas Pornin