views:

163

answers:

5

Hey guys, I do not understand autoincrementing letters in Perl.

This example seems perfectly understandable:

$a = 'bz'; ++$a;
ca #output

b gets incremented to c. There is nothing left for z to go to, so it goes back to a (or at least this is how I see the process).

But then I come across statements like this:

$a = 'Zz'; ++$a;
AAa #output

and:

$a = '9z'; ++$a;
10 #output

Why doesn't incrementing Zz return Aa? And why doesn't incrementing 9z return 0z?

Thanks!

A: 

I don't see why incrementing Zz would return Aa; why do you think it should? 9z incrementing looks like Perl thinks 9z is a number 9 rather than some kind of base-36 weirdness.

Kinopiko
+3  A: 

The answer is to not do that. The automagic incrementing of ++ with non-numbers is full of nasty pitfalls. It is suitable only for quick hacks.

You are better off writing your own iterator for this sort of thing:

#!/usr/bin/perl

use strict;
use warnings;

{ package StringIter;

    sub new {
        my $class = shift;
        my %self  = @_;
        $self{set}   = ["a" .. "z"] unless exists $self{set};
        $self{value} = -1           unless exists $self{value};
        $self{size}  = @{$self{set}};

        return bless \%self, $class;
    }

    sub increment {
        my $self = shift;
        $self->{value}++;
    }

    sub current {
        my $self = shift;
        my $n    = $self->{value};
        my $size = $self->{size};
        my $s    = "";

        while ($n >= $size) {
            my $offset  = $n % $size;
            $s          = $self->{set}[$offset] . $s;
            $n         /= $size;
        }
        $s = $self->{set}[$n] . $s;

        return $s;
    }

    sub next {
        my $self = shift;
        $self->increment;
        return $self->current;
    }
}

my $iter = StringIter->new;

for (1 .. 100) {
    print $iter->next, "\n";
}

my $iter = StringIter->new(set => [0, 1]);

for (1 .. 7) {
    print $iter->next, "\n";
}
Chas. Owens
Agree with Chas. Some of Perl's more esoteric features are best left alone for code clarity. You'll probably won't understand a thing you wrote after a while if you use these obscure features.
GeneQ
+17  A: 

To quote perlop:

If, however, the variable has been used in only string contexts since it was set, and has a value that is not the empty string and matches the pattern /^[a-zA-Z]*[0-9]*\z/, the increment is done as a string, preserving each character within its range, with carry.

The ranges are 0-9, A-Z, and a-z. When a new character is needed, it is taken from the range of the first character. Each range is independent; characters never leave the range they started in.

9z does not match the pattern, so it gets a numeric increment. (It probably ought to give an "Argument isn't numeric" warning, but it doesn't in Perl 5.10.1.) Digits are allowed only after all the letters (if any), never before them.

Note that an all-digit string does match the pattern, and does receive a string increment (if it's never been used in a numeric context). However, the result of a string increment on such a string is identical to a numeric increment, except that it has infinite precision and leading zeros (if any) are preserved. (So you can only tell the difference when the number of digits exceeds what an IV or NV can store, or it has leading zeros.)

I don't see why you think Zz should become Aa (unless you're thinking of modular arithmetic, but this isn't). It becomes AAa through this process:

  1. Incrementing z wraps around to a. Increment the previous character.
  2. Incrementing Z wraps around to A. There is no previous character, so add the first one from this range, which is another A.

The range operator (..), when given two strings (and the left-hand one matches the pattern), uses the string increment to produce a list (this is explained near the end of that section). The list starts with the left-hand operand, which is then incremented until either:

  1. The value equals the right-hand operand, or
  2. The length of the value exceeds the length of the right-hand operand.

It returns a list of all the values. (If case 2 terminated the list, the final value is not included in it.)

cjm
Thanks for your answers everyone. The problem I'm having is that I don't understand how the flip flop operator works. When I see aa .. cc I picture: aa ab ac ca cb cc, but instead I'm getting a and b going all the way down to z when I didn't tell it to.
Brian
@Brian, that's the range operator, not the flip-flop operator. They're spelled the same, but one occurs only in list context, and the other only in scalar context.
cjm
@Brian, do you mean you don't understand, or you didn't understand? You did tell it to generate aa ab ... ay az ba bb ... by bz ca cb cc (even if that's not what you meant to say).
cjm
I'm not seeing many explanations online about how this works. By looking at the pattern, it looks like the rightmost character in the first operand will go until z, whereas the rightmost character in the second operand will go backwards until it meets a. I'm sure there is a better way to explain it, it gets more confusing when I do something like: aa .. cb;
Brian
@Brian, it's not a pattern. It starts with the left-hand operand. It increments that until it equals the right-hand operand, and returns a list of all the values. Letters cycle through A-Z, digits through 0-9.
cjm
haha thanks cjm, I get it now. I guess I shouldn't learn Perl at 4am. It's quite easy to understand once you get it! :P Thanks.
Brian
@Brian: The important words in the perldoc quote are "with carry", which is what cjm explained in his last comment.
dolmen
The most important consequence of string-incrementing strings of all digits is that incrementing "000123" gives "000124", and not just 124. This is important if the values are later going to be sorted as strings :)
hobbs
+5  A: 
  1. Because (ignoring case for the moment; case is merely preserved, nothing interesting happens with it), 'AA' is the successor to 'Z', so how could it also be the successor to 'ZZ'? The successor to 'ZZ' is 'AAA'.

  2. Because as far as ++ and all other numeric operators are concerned, "9z" is just a silly way of writing 9, and the successor to 9 is 10. The special string behavior of auto-increment is clearly specified to only occur on strings of letters, or strings of letters followed by numbers (and not mixed in any other way).

hobbs
+2  A: 

You're asking why increment doesn't wrap around.

If it did it wouldn't really be an increment. To increment means you have a totally ordered set and an element in it and produce the next higher element, so it can never take you back to a lower element. In this case the total ordering is the standard alphabetical ordering of strings (which is only defined on the English alphabet), extended to cope with arbitrary ASCII strings in a way that seems natural for certain common types of identifier strings.

Wrapping would also defeat its purpose: usually you want to use it to generate arbitrarily many different identifiers of some sort.

I agree with Chas Owens's verdict: applying this operation to arbitrary strings is a bad idea, that's not the sort of use it was intended for.

I disagree with his remedy: just pick a simple starting value on which increment behaves sanely, and you'll be fine.

reinierpost