tags:

views:

226

answers:

2

I'm having problems understanding how input to the md5 function in MySQL 4.1.22 is handled. Basically I'm not able to recreate the md5sum of a specific value combination for comparison. I guess it has something to do with the format of the input data.

I have set up a table with two columns of type double(direction and elevation) + a third for storing a md5 sum.

With a setup script i add data to the direction and elevation fields + create a checksum using the following syntax:

insert into polygons (
  direction, 
  elevation, 
  md5sum
) 
values ( 
  (select radians(20.0)), 
  (select radians(30.0)), 
  ( md5( (select radians(20.0)) + (select radians(20.0)) ) )
)

which ends up as: 0.349065850398866, 0.523598775598299, '0c2cd2c2a9fe40305c4e3bd991812df5'

Later I compare the stored md5sum with a newly calculated, the newly calculated is created using md5('0.349065850398866' + '0.523598775598299') and I get the following checksum: '8c958bcf912664d6d27c50f1034bdf34'

If I modify the last decimal in the passed "string" from a 9 to a 8 0.523598775598298 i get the same checksum as previously stored, '0c2cd2c2a9fe40305c4e3bd991812df5'. Any value of the last decimal from 8 down to 0 gives the same checksum.

Using BINARY, (md5( (select BINARY(radians(20.0))) + (select BINARY(radians(20.0))) ) in the setup script creates the same checksum as the my original "runtime calculation"

Worth mentioning is that the original method works for all other rows I have (55)

I guess I'm using the function in a somewhat strange way, but I'm not sure of a better way, so in order to find a better way I feel I need to understand why the current is failing.

+2  A: 

The two numbers you are adding are stored in binary form, but displayed in decimal form. There is no guarantee that you will get exactly the same number back if you give the decimal form back to the machine.

In this case, this causes the addition to give a slightly different result, which gives an entirely different MD5 sum:

mysql> select radians(20.0) + radians(30.0), '0.349065850398866' + '0.523598775598299';
+-------------------------------+-------------------------------------------+
| radians(20.0) + radians(30.0) | '0.349065850398866' + '0.523598775598299' |
+-------------------------------+-------------------------------------------+
|              0.87266462599716 |                          0.87266462599717 | 
+-------------------------------+-------------------------------------------+
1 row in set (0.00 sec)

If you want to consistently get the same result, you need to store the results of radians(20.0) and radians(30.0) in variables somewhere, never relying on their printed representations.

legoscia
Thank you for a good answer. I'm now doing the "run-time" calculation directly in mysql using the data stored in the direction and elevation columns without round tripping to my application.
Kristofer
A: 

The output of radians(20.0) is computed with many more digits than are shown in the printed output. When the result is passed to the md5 function, the full non-truncated value is used, whereas the printed value only will show a limited number of digits. Thus, it is not the same value being passed into the md5 function in the two cases.

Dustin Fineout
Figured. But I guess it would be possible to round trip the values of direction and elevation to the application using a select and then pass it to the md5 function in a new query if I managed to keep the precision(given the column data types are correct), and never downcasted the type, or am I wrong?What data type does "select radians(30.0)" return, and how is it's precision affected when passed to the md5 function (taking strings)
Kristofer