views:

215

answers:

6

I need an efficient commify filter or routine for use with Template::Toolkit. It is to be used many times on the page. It should support decimals.

This one is found in The Perl Cookbook:

sub commify {
    my $text = reverse $_[0];
    $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
    return scalar reverse $text;
}

Are there more efficient ways?

+4  A: 

I think that if you are worried about speed of this - you are seriously misplacing your worries.

The commify function works on my desktop 130000 times per second on number like: "31243245356.4432".

this means that if you have on your page 10000 numbers, commification of it will take 76ms. And template toolkit processing of the page will probably take 2-3 times as long.

depesz
+7  A: 

Before you try to optimize anything, be sure its actually a problem. Use a profiler to find the problem areas in your code and focus on those areas.

That commify solution is about as good as you can get, but there are other things you might do if you need to bypass it:

  • Use something like Memoize to cache results if you are commifying the same numbers repeatedly
  • Pre-compute all the numbers if they are unlikely to change.
  • Cache processed templates when you can
  • Use a reverse proxy setup with your webserver to hand off heavy processing to backend servers.
brian d foy
+1  A: 

Another option is:

sub commify {
    my $text = shift;
    1 while $text =~ s/ ( \d ) ( \d{3} ) (\D|\z) /$1,$2$3/xms;
    return $text;
}

When it comes to deciding which is faster, the Benchmark module is very useful.

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark;

sub your_commify {
    my $text = reverse 100000000;
    $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
    return scalar reverse $text;
}

sub my_commify {
    my $text = 100000000;
    1 while $text =~ s/ ( \d ) ( \d{3} ) (\D|\z) /$1,$2$3/xms;
    return $text;
}

timethese(
    -10,
    {
        'yours' => \&your_commify,
        'mine'  => \&my_commify,
    }
);

Runing this gives:

~$ ./benchmark.pl
Benchmark: running mine, yours for at least 10 CPU seconds...
mine: 10 wallclock secs (10.01 usr +  0.01 sys = 10.02 CPU) @ 111456.89/s (n=1116798)
yours: 11 wallclock secs (10.04 usr +  0.00 sys = 10.04 CPU) @ 250092.33/s (n=2510927)

Looks like yours is ~2.25 times faster! (When using the "at least 10 CPU seconds" mode you have to check the values of "n" used.)

So it looks like you have to keep searching... but remember to use Benchmark!

Kaoru
+2  A: 

I agree with brian and depesz: from a practical point of view, this function is probably not the place to start if you're trying to improve the performance of your app. That said, it is possible to write a much faster commify function. One way is to avoid regular expressions.

use strict;
use warnings;
use Benchmark qw(cmpthese);

sub commify_regex {
    my $text = reverse $_[0];
    $text =~ s{(\d\d\d)(?=\d)(?!\d*\.)}{$1,}g;
    return scalar reverse $text;
}

sub commify_substr {
    my $v = $_[0];
    my $len = length $v;
    my $dec = index($v, '.');
    my $i = 3 + ($dec < 0 ? 0 : $len - $dec);
    $len -- unless $v == abs($v);
    while ($i < $len ++){
        substr($v, -$i, 0, ',');
        $i += 4;    
    }
    return $v;
}

my @tests = qw(
    1 12 123 1234 12345 123456 1234567 
    12345678 123456789 1234567890 12345678901 
    123456789012 1234567890123
);
push @tests, map "$_.$_", @tests;
push @tests, map - $_,    @tests;

for my $t (@tests){
    print "Incorrect for: ", $t, "\n"
        unless commify_substr($t) eq commify_regex($t);
}

cmpthese( -2, {
    regex  => sub { commify_regex($_)  for @tests },
    substr => sub { commify_substr($_) for @tests },
});

# Output on my Windows machine.

         Rate  regex substr
regex  3277/s     --   -54%
substr 7109/s   117%     --
FM
I also agree that its not possible to improve the performance much by rewriting this single routine, but its interesting to see the solutions. BTW, the substr version can be further improved (about 7%), see my own answer..
eugene y
+1  A: 
sub commify {
    my $n = $_[0];

    my $s = abs($n) != $n;
    my $x = index($n, '.');
    $x = length($n) if $x == -1;
    substr($n, $x, 0, ',') while ($x -= 3) > $s;
    return $n;
}
eugene y
+1 Yes, that is better. You can speed things up even more (at least on my Windows machine) with two changes. (1) Don't use array assignment to get the arg: `my $n = $_[0]`. (2) Use the 4-argument form of `substr`, which allows you to eliminate one of your `substr` calls: `substr($n, $x, 0, ',') while ($x -= 3) > $s`.
FM
@FM Thanks! Updated the answer.
eugene y
A: 

This can be done using single regexp:

$number =~ s/(?<=\d)(?=(?:\d\d\d)+(?!\d))/,/g;
Ivan Nevostruev