I have encountered some strange Perl behavior: using a Posix character class in a regexp completely alters the sort order for the resulting strings.
Here is my test program:
sub namecmp($a,$b) {
$a=~/([:alpha:]*)/;
# $a=~/([a-z]*)/;
$aword= $1;
$b=~/([:alpha:]*)/;
# $b=~/([a-z]*)/;
$bword= $1;
return $aword cmp $bword;
};
$_= <>;
@names= sort namecmp split;
print join(" ", @names), "\n";
If you change to the commented-out regexp's using [a-z], you get the normal, lexicographic sort order. However, the Posix [:alpha:] character class yields some weird-ass sort order, as follows:
$test_normal
aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb
aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb
$test_posix
aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb
baa bab bac bba bbb bbc bca bcb bcc caa cbb aba abb abc aca acb acc aab aac aaa
My best guess is that the Posix character class is activating some kind of locale stuff I've never heard of and didn't ask for. I suppose the logical reaction to "doctor, doctor, it hurts when I do this!" is, "well, don't do that, then!".
But, can anyone tell me what's happening here, and why? I'm using perl 5.10, but I believe it also works under perl 5.8.