tags:

views:

962

answers:

6

I have a script to convert to base 62 (A-Za-z0-9) but how do I get a number out of MD5?

I have read in many places that because the number from an MD5 is bigger than php can handle as an integer it will be inaccurate... As I want a short URL anyway and was not planning on using the whole hash, maybe just 8 characters of it....

So my question is how to get part of the number of an MD5 hash?

Also is it a bad idea to use only part of the MD5 hash?

+4  A: 

I'm going to suggest a different thing here.. Since you are only interested in using a decimal chunk of the md5 hash why don't you use any other short numeric hash like CRC32 or Adler? Here is an example:

$hash = sprintf('%u', crc32('your string here'));

This will produce a 8 digit hash of your string.

EDIT: I think I misunderstood you, here are some functions that provide conversions to and from bases up to 62.

EDIT (Again): To work with arbitrary length numbers you must use either the bc_math or the GMP extension, here is a function that uses the bc_math extension and can also convert from base 2 up to base 62. You should use it like this:

echo bc_base_convert(md5('your url here'), 16, 62); // public base 62 hash

and the inverse:

echo bc_base_convert('base 62 encoded value here', 62, 16); // private md5 hash

Hope it helps. =)

Alix Axel
is it possible to work out what went into the hash? Just I am thinking if I only ever show part of a hash it must make it more difficult to workout how it was generated... right?
Mark
Right, but then it wouldn't be a hash in the true sense of the word, also collisions are much more probable to occur.
Alix Axel
+1  A: 

You can do this like this: (Not all steps are in php, it's been a long time that I've used it.)

There's no risk in using only a few of the bits of a md5. All that changes is danger of collisions.

Georg
Nice link, thanks.
Alix Axel
A: 

You could use a slightly modified Base 64 with - and _ instead of + and /:

function base64_url_encode($str) {
    return strtr(base64_encode($str), array('+'=>'-', '/'=>'_'));
}
function base64_url_decode($str) {
    return base64_decode(strtr($str, array('-'=>'+', '_'=>'/')));
}

Additionally you could remove the trailing padding = characters.

And to get the raw MD5 value (binary string), set the second parameter (named $raw_output in the manual) to true:

$raw_md5 = md5($str, true);
Gumbo
check this http://stackoverflow.com/questions/352434/base-conversion-of-arbitrary-sized-numbers-php/1743486#1743486
Alix Axel
What’s wrong? Why the down-vote?
Gumbo
+1  A: 

If it's possible, I'd advise not using a hash for your URLs. Eventually you'll run into collisions... especially if you're truncating the hash. If you go ahead and implement an id-based system where each item has a unique ID, there will be far fewer headaches. The first item will be 1, the second'll be 2, etc---if you're using MySQL, just throw in an autoincrement column.

To make a short id:

//the basic example
$sid = base_convert($id, 10, 36);

//if you're going to be needing 64 bit numbers converted 
//on a 32 bit machine, use this instead
$sid = gmp_strval(gmp_init($id, 10), 36);

To make a short id back into the base-10 id:

//the basic example
$id = base_convert($id, 36, 10);

//if you're going to be needing 64 bit numbers
//on a 32 bit machine, use this instead
$id = gmp_strval(gmp_init($shortid, 36));

Hope this helps!

If you're truly wanting base 62 (which can't be done with gmp or base_convert), check this out: http://snipplr.com/view/22246/base62-encode--decode/

brianreavis
GMP, nice one! =)
Alix Axel
Mark
Fair enough! My bad. Don't mind me :)
brianreavis
A: 

You can do something like this,

$hash = md5("The data to be hashed", true);
$ints = unpack("L*num", $hash);

$hash_str = base62($ints['num1']) . base62($ints['num2']) . base62($ints['num3']) . base62($ints['num4'])
ZZ Coder
A: 

There actually is a Java implementation which you could porbably extract. It's an open-source CMS solution called Pulse.

Look here for the code of toBase62() and fromBase62().

http://pulse.torweg.org/javadoc/src-html/org/torweg/pulse/util/StringUtils.java.html

The only dependency in StringUtils is the LifeCycle-class which provides a way to get a salted hash for a string which you might even omit alltogether or just copy the method over to your copy StringUtils. Voilá.

Timo Mika Gläßer