views:

151

answers:

6

Could anyone recommend a preferred algorithm to use for URL shortening? I'm coding using PHP. Initially I thought about writing something that would start at a character such as "a" and iterate through requests, creating records in a database and therefore having to increment the character to b, c, d ... A, B and so on as appropriate.

However it dawned on me that this algorithm could be pretty heavy/clumsy and there could be a better way to do it.

I read around a bit on Google and some people seem to be doing it with base conversion from the database's ID column. This isn't something I'm too familiar with.

Could someone elaborate and explain to me how this would work? A couple of code examples would be great, too.

I obviously don't want a complete solution as I would like to learn by doing it myself, but just an explanation/pseudo-code on how this would work would be excellent.

Thanks chaps.

+1  A: 

Assuming your PRIMARY KEY is an INT and it auto_increments, the following code will get you going =).

<?php

    $inSQL = "INSERT INTO short_urls() VALUES();";
    $inResult = mysql_query($inSQL);
    $databaseID = base_convert(mysql_insert_id(), 10, 36);

    // $databaseID is now your short URL

?>

EDIT: Included the base_convert from HGF's answer. I forgot to base_convert in the original post.

Raphael Caixeta
Thanks very much. I can understand that, but surely if you just use the database ID, you are constrained to 0-9 of numbers, so only Base10? Wouldn't it be better to somehow utilise a-zA-Z0-9? Or am I having a very blonde moment...
George
+1  A: 

You can use base_convert function to do a base convertion from 10 to 36 with the database IDs.

<?php
   $id = 315;
   echo base_convert($id, 10, 36), "\n";
?>

Or you can reuse some of the ideas presented in the comments on the page bellow:

http://php.net/manual/en/function.base-convert.php

HGF
A: 

i used to break ID by algorithm similar with how to convert from decimal to hex, but it will use 62 character instead of 16 character that hex would use.

'0','1','2','3','4','5','6','7','8','9',
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'

example : if you will change ID = 1234567890 you will get kv7yl1 as your a key.

Swing Magic
A: 

I adopted a "light" solution. On user request I generate a unique identifier (checking for conflicts in db) with this python snipplet:

url_hash = base64.b64encode(os.urandom(int(math.ceil(0.75*7))))[:6]

and store it in db.

Enrico Carlesso
+2  A: 

Most shortening services just use a counter that is incremented with every entry and convert the base from 10 to 64.

An implementation in PHP could look like this:

function encode($number) {
    return strtr(rtrim(base64_encode(pack('i', $number)), '='), '+/', '-_');
}
function decode($base64) {
    $number = unpack('i', base64_decode(str_pad(strtr($base64, '-_', '+/'), strlen($base64) % 4, '=')));
    return $number[1];
}

$number = mt_rand(0, PHP_INT_MAX);
var_dump(decode(encode($number)) === $number);

The encode function takes an integer number, converts it into bytes (pack), encodes it with the Base-64 encoding (base64_encode), trims the trailing padding = (rtrim), and replaces the characters + and / by - and _ respectively (strtr). The decode function is the inverse function to encode and does the exact opposite (except adding trailing padding).

The additional use of strtr is to translate the original Base-64 alphabet to the URL and filename safe alphabet as + and / need to be encoded with the Percentage-encoding.

Gumbo
Thanks for your suggestions! Plenty to get my teeth into now. I shall have a play with Base64. Thank you.
George
+1  A: 

The native PHP base_convert() works well for small ranges of numbers, but if you really need to encode large values, consider using something like the implementation provided here which will work to base 64 and beyond if you simply provide more legal characters for the encoding.

http://af-design.com/blog/2010/08/10/working-with-big-integers-in-php/

Erik Giberti