views:

434

answers:

4

How to easy encode and "compress" URL/e-mail adress to string in PHP?

String should be:

  1. difficult to decode by user
  2. as short as possible (compressed)
  3. similar URLs should be different after encoding
  4. not in database
  5. easy to decode/uncompress by PHP script


ex. input -> output, stackoverflow.com/1/ -> "n3uu399", stackoverflow.com/2/ -> "ojfiejfe8"

+4  A: 

Not very short but you could zip it with a password and encode it using base64. Note that zip is not too safe when it comes to passwords, but should be ok if your encrypted value is intended to have a short lifetime.

Note that whatever you do, you won't be able to generate a somewhat safe encoding unless you agree to store some unaccessible information locally. This means, whatever you do, take it as given that anyone can access the pseudo-encrypted data with enough time, be it by reverse engineering your algorithm, brute-forcing your passwords or whatever else is necessary.

soulmerge
A: 

if you have access to a database then you could do a relational lookup i.e. there will be 2 fields, field one holding the original URL and the second holding the compressed URL. To make the second URL you could do something like the following

$str = "a b c d e f g h i j k l m n o p q r s t u v w x y z";

$str = explode(" ", $str);
$len = 5;

for($i = 0; $i < $len; $i++)
{
    $pos = rand(0, (count($str) - 1));
    $url .= $str[$pos];
}

This is just an idea that i have thought up, code isn't tested

Marc Towler
4. not in database
SeanJA
I think you would be very likely to get collisions (i.e. the same code for different urls). Also you don't need to split the string into an array - you can access string offsets using [] anyway. Also it would be better to call count() outside the loop and store the result.
Tom Haigh
+1  A: 

You could make your own text compression system based on common strings: if URL starts 'http://www.', then the first character of the shortened URL is 'a', if it starts 'https://www.', then the first character is 'b'...(repeat for popular variants), if not then first letter is 'z' and the url follows in a coded pattern.

The if next three letters are 'abc', the second letter is 'a' etc. You'll need a list of which letter pairs/triplets are most common in URLs and work out the most popular 26/50 etc (depending on which characters you want to use) and you should be able to conduct some compression on the URL entirely in PHP (without using a database). People will only be able to reverse it by either knowing your letter pair/triplet list (your mapping list) or by manually reverse-engineering it.

Richy C.
+1  A: 

Here is a simple implementation that may or may not fulfil your needs:

Input/Output:

[email protected]    cyUzQTEzJTNBJTIydGVzdCU0MHRlc3QuY29tJTIyJTNC             [email protected]  
http://test.com/ cyUzQTE2JTNBJTIyaHR0cCUzQSUyRiUyRnRlc3QuY29tJTJGJTIyJTNC http://test.com/

Code:

function encode ($in) {
    return base64_encode(rawurlencode(serialize($in)));
}

function decode ($in) {
    return unserialize(rawurldecode(base64_decode($in)));
}

shrug

You need to be more specific about your inputs and outputs and what you expect from each.

You could also use gzcompress/gzuncompress instead of serialize/unserialize, etc.

Nick Presta