views:

62

answers:

2

I am writing an affiliate system, and I want to generate a unique 32 character wide token, from the url.

The problem is that a URL can be up to 128 chars long (IIRC). Is there a way that I can create a unique 32 char wide key/token from a given URL, without any 'collisions'?

I am not sure if this is an encoding, encryption or hashing problem (probably, a mixture of all three).

I will be implementing this 'mapping function' using PHP, since that is the language I am using to build this particular system. Any suggestions on how to go about doing this?

Is it even possible to map a 128 char string into a 32 char string uniquely (i.e. no collisions?) ...

[Edit]

I just did some reading up, and found that the max length of urls is actually, something in the order of 2K. However, I am not concerned about 'silly' edge cases like that. I am pretty sure that 99.9% of the time, my imposed limit of 128 chars should be sufficient.

+6  A: 

Is it even possible to map a 128 char string into a 32 char string uniquely (i.e. no collisions?) ...

In part. You can use a hash function like md5 or sha1. That's what they were built to do.
MD5 generates a 32 char string, and SHA1 generates a 40 char string.

Of course you can't guarantee that there won't be collisions. That's impossible since the message space is too large for your hashes (there are 21024 messages vs 2128 possible hashes if you are using MD5), but these functions are meant to be collision resistant and hard to reverse.

Wikipedia references:

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/MD5
http://en.wikipedia.org/wiki/SHA-1

NullUserException
+1  A: 

Is it even possible to map a 128 char string into a 32 char string uniquely (i.e. no collisions?) ...

That depends on the alphabet being used for both input and output. If your resulting 32 char hash is limited to an alphabet of a-z, you can encode a maximum of 26^32 = 1.901722×10^45 values in it. A URL can consist of at least a-z and quite a number of other characters, so can contain at least 26^128 = 1.307942×10^181 values. So, an alphabet of 26 characters is not enough.

Using a-zA-Z0-9 you can encode 62^32 = 2.272658×10^57 unique values, which is still not enough. Even an alphabet of 100 characters gives you only 100^32 = 1.0×10^64 possible values.

Depending on what exactly you want to do, you should either increase the length of the hash or rethink the overall approach.

deceze