views:

96

answers:

3

In Django Design Patterns, the author recommends using zlib.crc32 to mask primary keys in URLs. After some quick testing, I noticed that crc32 produces negative integers about half the time, which seems undesirable for use in a URL. zlib.adler32 does not appear to produce negatives, but is described as "weaker" than CRC.

  1. Is this method (either CRC or Adler-32) safe for usage in a URL as an alternate to a primary key? (i.e. is it collision-safe?)
  2. Is the "weaker" Adler-32 a satisfactory alternative for this task?
  3. How the heck do you reverse this?! That is, how do you determine the original primary key from the checksum?
+1  A: 

You can interpret the 32 bit CRC value as an unsigned integer.

compie
Good to know. What about collisions and reversibility?
David Eyk
A: 

Upon further investigation, this seems like a really bad idea:

In [11]: s = set([zlib.crc32(str(x)) for x in xrange(20000000)])
In [12]: len(s)
Out[12]: 19989760
In [13]: 20000000 - len(s)
Out[13]: 10240

That's 10,240 collisions in 20,000,000 primary keys.

David Eyk
+1  A: 

The problem is not hashing the value. The problem is mapping the hash back to the key. Even if there are collisions you could always increment until you hit an unused hash.

The reason hashes are used for e.g. authentication is because there's already a key (e.g the username) that can be used to find the appropriate record. At that point it becomes just a matter of comparing the given hash against the stored hash. If you're using the hash to mask the key instead then it will be trickier than just comparing it. Turning the hash itself into the key will solve this though.

Ignacio Vazquez-Abrams