tags:

views:

105

answers:

2

I copied this example from perldoc -f sort. I added the @old array and the prints of the @new arrays. Why do I get three different @new arrays? Is there something wrong with my @old?

@old = qw( =332 =43 =avxc =aed =jjj =3322 =aa44 =ssss );
say "\nold  :   @old\n";

# inefficiently sort by descending numeric compare using
# the first integer after the first = sign, or the
# whole record case-insensitively otherwise
@new = sort {
($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
||
uc($a) cmp uc($b)
} @old;
say "new_1:   @new"; # =3322 =332 =43 =aa44 =aed =avxc =jjj =ssss

# same thing, but much more efficiently;
# we'll build auxiliary indices instead
# for speed
@nums = @caps = ();
for (@old) {
push @nums, /=(\d+)/;
push @caps, uc($_);
}
@new = @old[ sort {
$nums[$b] <=> $nums[$a]
||
$caps[$a] cmp $caps[$b]
} 0..$#old
];
say "new_2:   @new"; # =avxc =332 =43 =3322 =aa44 =aed =jjj =ssss

# same thing, but without any temps
@new = map { $_->[0] }
sort { $b->[1] <=> $a->[1]
||
$a->[2] cmp $b->[2]
} map { [$_, /=(\d+)/, uc($_)] } @old;
say "new_3:   @new\n"; # =3322 =332 =43 =avxc =aed =jjj =aa44 =ssss
+3  A: 

Empty lists

The three lists are different because the regular expression

/=(\d+)/

may return an empty list (when it doesn't match) --- messing up your structures. I inserted the following lines into your code and got the indicated answers:

say "nums: @nums";  # nums: 332 43 3322
say "caps: @caps";  # caps: =332 =43 =AVXC =AED =JJJ =3322 =AA44 =SSSS

You see, any item that doesn't match the regexp is missing in @nums and messes up your list. This can be fixed by changing the definition of @nums to:

push @nums, /=(\d+)/? $1:undef;

(Edit: added clarification) The same problem happens in your last example:

For $_="=123", [$_, /=(\d+)/, uc($_)] = ["123",  123,  123].
For $_="abcd", [$_, /=(\d+)/, uc($_)] = ["abcd", "ABCD"].

The match disappears and the uppercased string moves into its place. This is not what you intended. One fix is to replace, as above, the regular expression by an expression that always produces exactly one scalar:

[$_, /=(\d+)/? $1:undef, uc($_)]

Another solution would be to swap the two last elements in the list:

@new = map { $_->[0] }                                                          
sort { $b->[2] <=> $a->[2]                                                      
||                                                                              
$a->[1] cmp $b->[1]                                                             
} map { [$_, uc($_), /=(\d+)/] } @old;                                          
say "new_3:   @new\n";

Now, the regular expression is at the end. If it produces no matches, the list is short (just two elements). Nevertheless, $a->[2] produces the desired result: undef. (You may or may not like this approach of letting the bug bite but have the correct result come out as a side-effect of perl reading beyond the end of the shorter lists).

With these fixes, all three lists produce the same result.

Warnings

Please run your program with the "-w" warnings switch on. You find that you compare quite a lot of undefined values and non-numerics. You should fix your program to do without this, e.g:

@new = map { $_->[0] }                                                          
sort {                                                                          
defined($b->[2]) && defined($a->[2]) && ($b->[2] <=> $a->[2])                   
||                                                                              
$a->[1] cmp $b->[1]                                                             
} map { [$_, uc($_), /=(\d+)/] } @old;                                          
say "new_4:   @new\n";
Yaakov Belch
Could you explain me the sentence: "If you put the match last in your temporaries, an empty list will produce undef correctly".Why does it make a difference, when I put the match at the end.
sid_com
Yes, the docs are broken here. I'll fix them. Thanks.
brian d foy
I applied this fix in commit e1d16ab77ed to perl blead. I forgot to mention you by name in the commit message, but I did point at this question.
brian d foy
Will the changes get done with the next version of perl (if they change)?
sid_com
@sid_com: I edited the post to clarify this sentence. The difference is: If an element is missing at the end, it does not cause later elements to move to other places (as there are no later elements).
Yaakov Belch
+1  A: 

Those are three examples of sorting an array using a user defined comparator. Each example outputs the sorted array after it's run.

Since there are three examples, the sorted array is printed three times. Of course, you are not expected to use all three versions when you want to sort.

The last example uses the Schwartzian Transform.

I think the original author's intention was for all examples to produce the same sorted output, but due to the bugs explained by Yaakov Belch, they don't.

Note that your data do not match the expectations of the comparator you are using. The comparator your are using does expect all records to be of the form =\d.+ (i.e. an equal sign followed by a digit followed by arbitrary characters). If all records do not conform to this format, you need to be careful in the comparator.

Fixed first and third examples below. I don't think there is any need to use example 2: Either it pays to pre-compute or it doesn't. If it does, use the Schwartzian Transform. If it doesn't, use the regular sort that extracts keys for each comparison.

#!/usr/bin/perl

use 5.010;
use strict; use warnings;

my @old = qw( =332 =43 =avxc =aed =jjj =3322 =aa44 =ssss );

my @new_1 = sort {
    my $ad = $a =~ /=(\d+)/ ? $1 : undef;
    my $bd = $b =~ /=(\d+)/ ? $1 : undef;
    return  1 if defined $bd and not defined $ad;
    return -1 if not defined $bd and defined $ad;
    return $bd <=> $ad if defined $ad and defined $bd;
    return uc $a cmp uc $b;
} @old;

say "new_1:\t@new_1\n"; # =3322 =332 =43 =avxc =aed =jjj =aa44 =ssss

my @new_3 = map { $_->[0] }
    sort {
        return  1 if defined $b->[1] and not defined $a->[1];
        return -1 if not defined $b->[1] and defined $a->[1];
        return  $b->[1] <=> $a->[1] if defined $b->[1] and defined $a->[1];
        return  $a->[2] cmp $b->[2];
} map { [$_, /=(\d+)/ ? $1 : undef, uc($_)] } @old;

say "new_3:\t@new_3\n"; # =3322 =332 =43 =avxc =aed =jjj =aa44 =ssss

Output:

new_1:  =3322 =332 =43 =aa44 =aed =avxc =jjj =ssss

new_3:  =3322 =332 =43 =aa44 =aed =avxc =jjj =ssss
Sinan Ünür