views:

333

answers:

3

I've done a small experiment as will be shown below and it looks like that a while loop is faster than a for loop in Perl. But since the experiment was rather crude, and the subject might be a lot more complicated than it seems, I'd like to hear what you have to say about this. Thanks as always for any comments/suggestions :)

In the following two small scripts, I've tried while and for loops separately to calculate the factorial of 100,000. The one that has the while loop took 57 minutes 17 seconds to finish while the for loop equivalent took 1 hour 7 minutes 54 seconds.

Script that has while loop:

use strict;
use warnings;
use bigint;

my $now = time;

my $n = shift;
my $s = 1;

while(1){
$s *= $n;
$n--;
last if $n==2;
}

print $s*$n;
$now = time - $now;
printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600),
            int(($now % 3600) / 60), int($now % 60));

Script that has for loop:

use strict;
use warnings;
use bigint;

my $now = time;

my $n =shift;
my $s=1;

for (my $i=2; $i<=$n;$i++) {
$s = $s*$i;
}

print $s;
$now = time - $now;
printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600),
            int(($now % 3600) / 60), int($now % 60));
+5  A: 

I would be fall down shocked if there was actually any "real" difference between while and for loops. Assuming that they were doing the "exact" same thing, they should be optimized by the interpreter to be more or less identical.

I would wager that the difference was probably nothing more than other processes that were vying differently for resources during the two executions.

Even if there was a difference, don't get caught up in The Sad Tragedy of Micro-Optimization Theater.

Morinar
@Morinar, I've just finished the article you suggested. I see the point you're making, thanks.
Mike
+6  A: 

The loops are not equivalent, and you are primarily thrashing the bigint package and it has nothing to do with for vs while per se.

The while loop uses the notation '$s *= $i' but the for loop uses '$s = $s * $i'. It is simple enough to demonstrate that these are not identical. Also, one loop counts up; the other counts down. This affects how big the numbers to be multiplied are. It is a second order effect - but not completely negligible.

[Update: revised to show just one version of the code, with sub-second timings. There is room to think that the printing should be excluded from the timing calculations; that makes things messier though, so I haven't bothered. I have fixed the bug in the previous version: loop 4 was the same as loop 3 - now it isn't. I've also tarted up the output formatting (though the sub-second handling could be improved - an exercise for the reader), and there is better 'progress reporting'.]

The timing results on a Mac Mini (Snow Leopard 10.6.2) were:

Count up   $s *= $i:      00:00:12.663337
Count up   $s  = $s * $i: 00:00:20.686111
Count down $s *= $i:      00:00:14.201797
Count down $s  = $s * $i: 00:00:23.269874

The script:

use Time::HiRes qw(gettimeofday);
use strict;
use warnings;
use bigint;
use constant factorial_of => 13000;

sub delta_t
{
    my($tag, $t1, $t2) = @_;
    my($d) = int($t2 - $t1);
    my($f) = ($t2 - $t1) - $d;
    my($s) = sprintf("%.6f", $f);
    $s =~ s/^0//;
    printf "%-25s %02d:%02d:%02d%s\n",
           $tag, int($d/3600), int(($d % 3600) / 60), int($d % 60), $s;
}

my $t1 = gettimeofday;

{
    my $n = factorial_of;
    my $s = 1;
    for (my $i = 2; $i <= $n; $i++)
    {
        $s *= $i;
    }
    print "$s\n: Loop 1\n";
}

my $t2 = gettimeofday;
delta_t('Count up   $s *= $i:',      $t1, $t2);

{
    my $n = factorial_of;
    my $s = 1;
    for (my $i = 2; $i <= $n; $i++)
    {
        $s = $s * $i;
    }
    print "$s\n: Loop 2\n";
}

my $t3 = gettimeofday;
delta_t('Count up   $s *= $i:',      $t1, $t2);
delta_t('Count up   $s  = $s * $i:', $t2, $t3);

{
    my $n = factorial_of;
    my $s = 1;
    for (my $i = $n; $i > 1; $i--)
    {
        $s *= $i;
    }
    print "$s\n: Loop 3\n";
}

my $t4 = gettimeofday;
delta_t('Count up   $s *= $i:',      $t1, $t2);
delta_t('Count up   $s  = $s * $i:', $t2, $t3);
delta_t('Count down $s *= $i:',      $t3, $t4);

{
    my $n = factorial_of;
    my $s = 1;
    for (my $i = $n; $i > 1; $i--)
    {
        $s = $s * $i;
    }
    print "$s\n: Loop 4\n";
}

my $t5 = gettimeofday;
delta_t('Count up   $s *= $i:',      $t1, $t2);
delta_t('Count up   $s  = $s * $i:', $t2, $t3);
delta_t('Count down $s *= $i:',      $t3, $t4);
delta_t('Count down $s  = $s * $i:', $t4, $t5);

And here's a much more compact version of the code above, extended to test 'while' loops as well as 'for' loops. It also deals with most of the timing issues. The only thing that isn't ideal (to me) is that it uses a couple of global variables, and I scrunched the code in the code refs slightly so it all fits on one line without triggering a scroll bar (on my display, anyway). Clearly, with a bit more work, the testing could be wrapped up into an array, so that the testing would be done iteratively - a loop through the array running the timer function on the information in the array. Etc...it's a SMOP - Simple Matter of Programming. (It prints the MD5 hash of the factorial, rather than the factorial itself, because it is easier to compare the results, etc. It did point out a couple of errors as I was refactoring the code above. Yes, MD5 is not secure - but I'm not using it for security; just to spot unintentional changes.)

use Time::HiRes qw(gettimeofday);
use Digest::MD5 qw(md5_hex);
use strict;
use warnings;
use bigint;
use constant factorial_of => 13000;

my ($s, $i);

my $l1 = sub {my($n) = @_; for ($i = 2;  $i <= $n; $i++) { $s *= $i;     }};
my $l2 = sub {my($n) = @_; for ($i = 2;  $i <= $n; $i++) { $s = $s * $i; }};
my $l3 = sub {my($n) = @_; for ($i = $n; $i > 1;   $i--) { $s *= $i;     }};
my $l4 = sub {my($n) = @_; for ($i = $n; $i > 1;   $i--) { $s = $s * $i; }};
my $l5 = sub {my($n) = @_; $i = 2;  while ($i <= $n) { $s *= $i;     $i++; }};
my $l6 = sub {my($n) = @_; $i = 2;  while ($i <= $n) { $s = $s * $i; $i++; }};
my $l7 = sub {my($n) = @_; $i = $n; while ($i > 1)   { $s *= $i;     $i--; }};
my $l8 = sub {my($n) = @_; $i = $n; while ($i > 1)   { $s = $s * $i; $i--; }};

sub timer
{
    my($n, $code, $tag) = @_;
    my $t1 = gettimeofday;
    $s = 1;
    &$code(factorial_of);
    my $t2 = gettimeofday;
    my $md5 = md5_hex($s);
    printf "Loop %d: %-33s %09.6f (%s)\n", $n, $tag, $t2 - $t1, $md5;
}

my $count = 1;
timer($count++, $l1, 'for   - Count up   $s *= $i:');
timer($count++, $l2, 'for   - Count up   $s  = $s * $i:');
timer($count++, $l3, 'for   - Count down $s *= $i:');
timer($count++, $l4, 'for   - Count down $s  = $s * $i:');
timer($count++, $l5, 'while - Count up   $s *= $i:');
timer($count++, $l6, 'while - Count up   $s  = $s * $i:');
timer($count++, $l7, 'while - Count down $s *= $i:');
timer($count++, $l8, 'while - Count down $s  = $s * $i:');

Example output (MD5 checksum compressed to avoid line breaking - the full value is 584b3ab832577fd1390970043efc0ec8):

Loop 1: for   - Count up   $s *= $i:      12.853630 (584b3ab8...3efc0ec8)
Loop 2: for   - Count up   $s  = $s * $i: 20.854735 (584b3ab8...3efc0ec8)
Loop 3: for   - Count down $s *= $i:      14.798155 (584b3ab8...3efc0ec8)
Loop 4: for   - Count down $s  = $s * $i: 23.699913 (584b3ab8...3efc0ec8)
Loop 5: while - Count up   $s *= $i:      12.972428 (584b3ab8...3efc0ec8)
Loop 6: while - Count up   $s  = $s * $i: 21.192956 (584b3ab8...3efc0ec8)
Loop 7: while - Count down $s *= $i:      14.555620 (584b3ab8...3efc0ec8)
Loop 8: while - Count down $s  = $s * $i: 23.790795 (584b3ab8...3efc0ec8)

I consistently see a small (<1%) penalty for the 'while' loop over the corresponding 'for' loop, but I don't have good explanation for it.

Jonathan Leffler
@Jonathan Leffler, thanks a lot! Your illustration code is very instructive to me. Thanks :)
Mike
@Joanthan, thanks for the updated code.I always thought $s *= $i' and '$s = $s * $i' and $i++ and $i-- were doing the same thing in different manners but I was wrong.Thanks very much for pointing this out :) I've now changed the while vs for scripts and now I've got: my $now = time;my $n =shift;my $i=2;my $s=1;for (;$i<=$n;$i++) {$s *=$i;}And my $n =shift;my $i=2;my $s=1;while($i<=$n){$s *= $n;$i++;}They look similar. Result: while is faster. I'm not sure but is there anything wrong with my experiment design? I've run your code, the result was while is slower.
Mike
@Mike: I don't have a good feel for what the residual issues are. The main points are (1) the problem is primarily in 'bigint' and (2) it is likely that the residual 'while vs for' differences are buried deep in the Perl byte code. I got some variation in the timings - mostly of the order of 0.1 seconds or so, unless there was also a backup running (TimeMachine to a TimeCapsule); I chose 13000 as my test number to get a big enough number to get sensible times while not being so big as to make it uncomfortable to run tests (1 hour is too long, for example).
Jonathan Leffler
I just tried the code on my MacBook Pro (3GHz, 4GB). With a 32-bit Perl, the times were slower than on the Mac Mini (20-30 seconds instead of 12-23); with a 64-bit Perl, the times were quicker (7-15 seconds). And typing out the ranges like that, I see there is a 2:1 ratio between best and worst performance, roughly. But the main difference is not 'while' vs 'for'. At this point, I'd go with whatever works for you - aware that bigint calculations are somewhat sensitive to the way they're written.
Jonathan Leffler
+4  A: 

One key to benchmarking is to simplify. The issue posed is the speed of for vs. while. But experiment involves several unnecessary complexities.

  • The two loops are not as similar as they could be. One uses $s *= $n and the other uses $s = $s * $i (as Jonathan Leffler points out). One uses $n-- and the other uses $i++ (who knows whether they differ in speed?).

  • If we are interested in for vs. while, there is no need to involve bigint. That only confuses the topic. In particular, your while script depends on just one bigint object ($s), whereas your for script uses two of them ($s and $i). It doesn't surprise me that the for script is slower.

Rewrite your loops to be as similar as possible, keep the factorials small enough so that you don't have to use bigint, and use the Benchmark module. Then you can run a fair head-to-head race of for vs. while. I'll be curious to see what you find.

FM
@FM, my experiment was so poorly designed that my inference from the result was almost totally irrelevant to the question I posted. This was a total failure. Well, anyway thanks for leaving me these instructive comments. It looks like I can always learn a thing or two from you guys :)
Mike
@Mike Don't be too tough on yourself. Benchmarking is tricky, and even experienced programmers make mistakes in setting them up. For example: http://stackoverflow.com/questions/1083269/is-perls-unpack-ever-faster-than-substr and http://stackoverflow.com/questions/1960779/why-does-perls-tr-n-get-slower-and-slower-as-line-lengths-increase. Your benchmark may have been flawed, but the question was successful because you learned some useful things. :)
FM