tags:

views:

113

answers:

3

I have been working with a piece of code for youtube style URL's but I have found a bug and I am hoping someone can show me the most efficient way to fix it.

function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
    static $passcache;
        if(empty($passcache))
                $passcache = array();

    $index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
    if (!empty($passKey)) {
        // Although this function's purpose is to just make the
        // ID short - and not so much secure,
        // with this patch by Simon Franz (http://blog.snaky.org/)
        // you can optionally supply a password to make it harder
        // to calculate the corresponding numeric ID

                if(isset($passcache[$passKey]))
                        $index = $passcache[$passKey];
                else {
                        if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
                                $passhash = hash('sha512',$passKey);

                        $p = str_split($passhash);

                        array_multisort($p, SORT_DESC, $i);
                        $index = implode($i);
                        $passcache = $index;
                }
    }

    $base = strlen($index);

    if ($to_num) {
        // Digital number <<-- alphabet letter code
        $in = strrev($in);
        $out = 0;
        $len = strlen($in) - 1;
        for ($t = 0; $t <= $len; $t++) {
            $bcpow = bcpow($base, $len - $t);
            $out += strpos($index, $in[$t]) * $bcpow;
        }

        if (is_numeric($pad_up)) {
            $pad_up--;
            if ($pad_up > 0) {
                $out -= pow($base, $pad_up);
            }
        }
    } else {
        // Digital number -->> alphabet letter code
        if (is_numeric($pad_up)) {
            $pad_up--;
            if ($pad_up > 0) {
                $in += pow($base, $pad_up);
            }
        }

        $out = "";
        for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
                $bcp = bcpow($base, $t);
            $a = floor($in / $bcp);
            $out .= $index[$a];
            $in -= $a *  $bcp;
        }
        $out = strrev($out); // reverse
    }

    return $out;
}

The bug is only when encoding a single number 238328 as it is my base to the power of three. As a result it divides exactly and because of use of 'floor' it goes unnoticed and the script tries to add the 62nd character which does not exist and only produces a three character code rather than four... thus 'aa' is the result rather than 'aaab'.

Here is the problem part of the code:

        for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
                $bcp = bcpow($base, $t);
            $a = floor($in / $bcp);
            $out .= $index[$a];
            $in -= $a *  $bcp;

And to make it even easier here is the call to get the error

echo alphaID(238328);

credits: Orginally written by Kevin Vanzonneveld: kevin dot vanzonneveld dot net, modified by Simon Franz: blog dot snaky dot org and optimised by Stackoverflows very own mattbasta

+1  A: 

Here you go:

function preciseDivision($x,$y)
{
    // Correct floor's failures by adding a bit of overhead
    $epsilon = 0.00000001;
    return floor(($x/$y) + $epsilon);
}
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
    static $passcache;
    if(empty($passcache))
            $passcache = array();

    $index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
    if (!empty($passKey)) {
       // Although this function's purpose is to just make the
       // ID short - and not so much secure,
       // with this patch by Simon Franz (http://blog.snaky.org/)
       // you can optionally supply a password to make it harder
       // to calculate the corresponding numeric ID

               if(isset($passcache[$passKey]))
                       $index = $passcache[$passKey];
               else {
                       if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
                               $passhash = hash('sha512',$passKey);

                       $p = str_split($passhash);

                       array_multisort($p, SORT_DESC, $i);
                       $index = implode($i);
                       $passcache = $index;
               }
   }

   $base = strlen($index);

   if ($to_num) {
       // Digital number <<-- alphabet letter code
       $in = strrev($in);
       $out = 0;
       $len = strlen($in) - 1;
       for ($t = 0; $t <= $len; $t++) {
           $bcpow = bcpow($base, $len - $t);
           $out += strpos($index, $in[$t]) * $bcpow;
       }

       if (is_numeric($pad_up)) {
           $pad_up--;
           if ($pad_up > 0) {
               $out -= pow($base, $pad_up);
           }
       }
   } else {
       // Digital number -->> alphabet letter code
       if (is_numeric($pad_up)) {
           $pad_up--;
           if ($pad_up > 0) {
               $in += pow($base, $pad_up);
           }
       }

       $out = "";

       for ($t = preciseDivision(log10($in),log10($base)); $t >= 0; $t--) {

           $bcp = bcpow($base, $t);

           $a = preciseDivision($in, $bcp);
           $out .= $index[$a];
           $in -= $a *  $bcp;
       }
       $out = strrev($out); // reverse
   }

   return $out;
}

The problem here was not floor, but floating point precision. The division resulted 2.99999999, and the floor(2.999999) is equal to 2, not 3. This happens because limited size of floating point variables.

That's why it did not work.

I wrote a function preciseDivision, which automatically adds a very small value to the division, to get through this.

And I still believe that there should exist cleaner solutions to this url hashing problem. I'll see what I can do.

Saulius Lukauskas
its a youtube style url generator
ma long
This solves the $a but not the $t... any ideas?
ma long
What does a youtube style url generator do?
Saulius Lukauskas
Its used for converting big numerical id's in urls into shorter alhpa numerical strings..it takes an int and 62 base converts it... turning a normal number 0-9 into a-z-A-Z-0-9.
ma long
think tinyurl.com
ma long
A: 

Adding another answer as the first one is also working, though this one is cleaner.

I got rid of BC Math functions. If you're going to work with really large integers, this may not work. Otherwise, this is much cleaner solution:

function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
    static $passcache;
        if(empty($passcache))
                $passcache = array();

    $index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
    if (!empty($passKey)) {
        // Although this function's purpose is to just make the
        // ID short - and not so much secure,
        // with this patch by Simon Franz (http://blog.snaky.org/)
        // you can optionally supply a password to make it harder
        // to calculate the corresponding numeric ID

                if(isset($passcache[$passKey]))
                       $index = $passcache[$passKey];
                else {
                        if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
                                $passhash = hash('sha512',$passKey);

                        $p = str_split($passhash);

                        array_multisort($p, SORT_DESC, $i);
                        $index = implode($i);
                        $passcache = $index;
                }
    }

    $base = strlen($index);

    if ($to_num) {
        // Digital number <<-- alphabet letter code

        // A conversion from base $base to base 10

        $out = 0;           // End number
        $shift = 1;         // Starting shift
        $len = strlen($in); // Length of string

        for ($t = 0; $t < $len; $t++) 
        {
            $out += strpos($index, $in[$t]) * $shift; // $out is a number form alphabet * base^shift
            $shift *= $base;  // increase shift
        }       


        if (is_numeric($pad_up)) {
           $pad_up--;
           if ($pad_up > 0) {
               $out -= pow($base, $pad_up);
            }
        }
    } else {
        // Digital number -->> alphabet letter code
        if (is_numeric($pad_up)) {
            $pad_up--;
            if ($pad_up > 0) {
                $in += pow($base, $pad_up);
            }
        }

        $out = "";

        // A simple conversion from base 10 to base $base

        while ($in > 0)
        {
            $remainder = $in % $base;
            $in = intval(($in-$remainder)/$base);

            $out .= $index[$remainder];
        }

    }

    return $out;
}

The code is cleaner, and should be faster as well. Now it is much easier to see that this is only conversion from base 10 to base $base (62?) and vica-versa. It does not involve floating point division, thus it does not have the bug mentioned above.

If you need multiplying large integers, and so on, this can be implemented this way as well with some bright thinking.

Added BC Maths, as you said you need large integers

function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
   static $passcache;
       if(empty($passcache))
               $passcache = array();

   $index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
   $i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
   if (!empty($passKey)) {
       // Although this function's purpose is to just make the
       // ID short - and not so much secure,
       // with this patch by Simon Franz (http://blog.snaky.org/)
       // you can optionally supply a password to make it harder
       // to calculate the corresponding numeric ID

               if(isset($passcache[$passKey]))
                      $index = $passcache[$passKey];
               else {
                       if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
                               $passhash = hash('sha512',$passKey);

                       $p = str_split($passhash);

                       array_multisort($p, SORT_DESC, $i);
                       $index = implode($i);
                       $passcache = $index;
               }
   }

   $base = strlen($index);

   if ($to_num) {
       // Digital number <<-- alphabet letter code

       // A conversion from base $base to base 10

       $out = '0';           // End number
       $shift = 1;         // Starting shift
       $len = strlen($in); // Length of string

       for ($t = 0; $t < $len; $t++) 
       {
           $out = bcadd($out, bcmul(strpos($index, $in[$t]),$shift)); // $out is a number from alphabet * base^shift
           $shift = bcmul($shift, $base);  // increase shift
       }       


       if (is_numeric($pad_up)) {
          $pad_up--;
          if ($pad_up > 0) {
              $out -= pow($base, $pad_up);
           }
       }
   } else {
       // Digital number -->> alphabet letter code
       if (is_numeric($pad_up)) {
           $pad_up--;
           if ($pad_up > 0) {
               $in += pow($base, $pad_up);
            }
        }

        $out = "";

        // A simple conversion from base 10 to base $base

        while ($in > '0') // We're treating integer as a string, so BC math works
       {
            $remainder = bcmod($in,$base);
            $in = bcdiv($in, $base);

            $out .= $index[$remainder];
        }

    }

    return $out;
}
Saulius Lukauskas
I found that the highest number it will sucessfully do is 2147483647... how can I make it go higher (just in case)?
ma long
ma long
This does not work, as the number 2147483647 = 2^31 -1 is the largest integer that can be stored in signed 32bits which is a normal size of an Integer.If really really need to do bigger numbers, then use BC maths (Ill edit my post to add the solution). But if I were you, I would ask myself whether you will actually ever need more than 2 billion of id's to encode?That would take about 268 MB of data to store only the numeric representations of them all.
Saulius Lukauskas
I would also like to add that using BC maths is way less efficient than using built in maths (the maths that work only up to 2^31-1).
Saulius Lukauskas
Wow... you are a genius :D Thank you very much
ma long
+1  A: 

As per my answer to your other question, try replacing log10($in) / log10($base) with log($in, $base).

This avoids the inaccuracies associated with dividing the results of the two logarithms as floating point numbers and gives you the correct result.

Phil Ross