views:

315

answers:

3

What's the best way to do base36 arithmetic in Perl?

To be more specific, I need to be able to do the following:

  • Operate on positive N-digit numbers in base 36 (e.g. digits are 0-9 A-Z)

    N is finite, say 9

  • Provide basic arithmetic, at the very least the following 3:

    • Addition (A+B)

    • Subtraction (A-B)

    • Whole division, e.g. floor(A/B).

    • Strictly speaking, I don't really need a base10 conversion ability - the numbers will 100% of time be in base36. So I'm quite OK if the solution does NOT implement conversion from base36 back to base10 and vice versa.

I don't much care whether the solution is brute-force "convert to base 10 and back" or converting to binary, or some more elegant approach "natively" performing baseN operations (as stated above, to/from base10 conversion is not a requirement). My only 3 considerations are:

  1. It fits the minimum specifications above

  2. It's "standard". Currently we're using and old homegrown module based on base10 conversion done by hand that is buggy and sucks.

    I'd much rather replace that with some commonly used CPAN solution instead of re-writing my own bicycle from scratch, but I'm perfectly capable of building it if no better standard possibility exists.

  3. It must be fast-ish (though not lightning fast). Something that takes 1 second to sum up 2 9-digit base36 numbers is worse than anything I can roll on my own :)

P.S. Just to provide some context in case people decide to solve my XY problem for me in addition to answering the technical question above :)

We have a fairly large tree (stored in DB as a bunch of edges), and we need to superimpose order on a subset of that tree. The tree dimentions are big both depth- and breadth- wise. The tree is VERY actively updated (inserts and deletes and branch moves).

This is currently done by having a second table with 3 columns: parent_vertex, child_vertex, local_order, where local_order is an 9-character string built of A-Z0-9 (e.g. base 36 number).

Additional considerations:

  • It is required that the local order is unique per child (and obviously unique per parent),

  • Any complete re-ordering of a parent is somewhat expensive, and thus the implementation is to try and assign - for a parent with X children - the orders which are somewhat evenly distributed between 0 and 36**10-1, so that almost no tree inserts result in a full re-ordering.

+1  A: 

I would bet my money on converting to base10 and back.

If you dont have to do this very often and the numbers are not very large, that is the easiest (and thus least complex => least number of bugs) way to do it.

Of course, another way to do it is to also save the base10 number for computation purposes only, however, Im not sure if this is possible or has any advantage in your case

Henri
Computers prefer binary or hex, but I think the point stands with that caveat. Convert to a native number, do your computation, then switch it back.
Joel
Hex is for humans.
friedo
Hex is for humans? I prefer counting in decimal. Hex is just for a compact representation of decimals. Moreover, hex (=16) is a power of two, and power of two is not for humans in general. 2 hex nibbles = 1 byte, that is not by coincidence. ;)
Henri
Hex is an easier way for humans to see groups of 8 bits. The computer doesn't care about hex at all, but humans don't read binary very well. I read hexdumps quite frequently. It's a lot easier to deal with characters (even Unicode) by looking at their hex representation rather than their decimal representation.
brian d foy
+9  A: 

What about Math::Base36?

daotoad
Thanks - I was so sure that "base36" is a weird thing never used for anything practical I didn't even consider searching for that term! Duh.
DVK
Always search first, ask later.
brian d foy
Math::Base36 is straightforward, but really really slow. Benchmark it...
drewk
+8  A: 

I am assuming that Perl core modules are OK?

How about using base 10 integer math and convert from the base 36 result using POSIX::strtol()

There is HUGE variability in speed in the different methods to convert to/from base 36. Strtol is 80x faster than a Math::Base36:decode_base36 for example and the conversion subs that I have in the listing are 2 to 4X faster than Math::Base36. They also support any integer base up to 62. (easily extended by adding characters to the nums array.)

Here is a quick benchmark:

#!/usr/bin/perl
use POSIX;
use Math::BaseCnv;
use Math::Base36 ':all';
use Benchmark;

{
    my @nums = (0..9,'a'..'z','A'..'Z');
    $chr=join('',@nums);
    my %nums = map { $nums[$_] => $_ } 0..$#nums;

    sub to_base
    {
        my ($base, $n) = @_;
        return $nums[0] if $n == 0;
        return $nums[0] if $base > $#nums;
        my $str = ''; 
        while( $n > 0 )
        {
            $str = $nums[$n % $base] . $str;
            $n = int( $n / $base );
        }
        return $str;
    }

    sub fr_base
    {
        my ($base,$str) = @_;
        my $n = 0;

        return 0 if $str=~/[^$chr]/;

        foreach ($str =~ /[$chr]/g)
        {
            $n *= $base;
            $n += $nums{$_};
        }
        return $n;
    }
}

$base=36;   
$term=fr_base($base,"zzz");

for(0..$term) { push @numlist, to_base($base,$_); }

timethese(-10, {
        'to_base' => sub { for(0..$#numlist){ to_base($base,$_); }  },
        'encode_base36' => sub { for(0..$#numlist){ encode_base36($_); }  },
        'cnv->to 36' => sub { for(0..$#numlist){ cnv($_); }  },
        'decode_base36' => sub { foreach(@numlist){ decode_base36($_); }  }, 
        'fr_base' => sub { foreach(@numlist){ fr_base($base,$_); }  },
        'cnv->to decimal' => sub { foreach(@numlist){ cnv($_,$base,10); }  },
        'POSIX' => sub { foreach(@numlist){ POSIX::strtol($_,$base);}},
} );
drewk
Now that you mention it, I remember perusing the standard library docs for C many moons ago and being pleasantly surprised that `strol()` handled base 36 numbers. So, now I didn't even think of looking in the `POSIX` library. Since POSIX is just a thin wrapper around the C, its not surprising that it is so fast. Good call.
daotoad