tags:

views:

1323

answers:

3

I'm writing a web app that points to external links. I'm looking to create a non-sequential, non-guessable id for each document that I can use in the URL. I did the obvious thing: treating the url as a string and str#crypt on it, but that seems to choke on any non-alphanumberic characters, like the slashes, dots and underscores.

Any suggestions on the best way to solve this problem?

Thanks!

A: 

Use Digest::MD5 from Ruby's standard library:

Digest::MD5.hexdigest(my_url)
+10  A: 

Depending on how long a string you would like you can use a few alternatives:

require 'digest'
Digest.hexencode('http://foo-bar.com/yay/?foo=bar&a=22')
# "687474703a2f2f666f6f2d6261722e636f6d2f7961792f3f666f6f3d62617226613d3232"

require 'digest/md5'
Digest::MD5.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "43facc5eb5ce09fd41a6b55dba3fe2fe"

require 'digest/sha1'
Digest::SHA1.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "2aba83b05dc9c2d9db7e5d34e69787d0a5e28fc5"

require 'digest/sha2'
Digest::SHA2.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "e78f3d17c1c0f8d8c4f6bd91f175287516ecf78a4027d627ebcacfca822574b2"

Note that this won't be unguessable, you may have to combine it with some other (secret but static) data to salt the string:

salt = 'foobar'
Digest::SHA1.hexdigest(salt + 'http://foo-bar.com/yay/?foo=bar&a=22')
# "dbf43aff5e808ae471aa1893c6ec992088219bbb"

Now it becomes much harder to generate this hash for someone who doesn't know the original content and has no access to your source.

manveru
+1  A: 

I would also suggest looking at the different algorithms in the digest namespace. To make it harder to guess, rather than (or in addition to) salting with a secret passphrase, you can also use a precise dump of the time:

require 'digest/md5'
def hash_url(url)
  Digest::MD5.hexdigest("#{Time.now.to_f}--#{url}")
end

Since the result of any hashing algorithm is not guaranteed to be unique, don't forget to check for the uniqueness of your result against previously generated hashes before assuming that your hash is usable. The use of Time.now makes the retry trivial to implement, since you only have to call until a unique hash is generated.

webmat