views:

3084

answers:

26

The question on Hidden features of Perl yielded at least one response that could be regarded as either a feature or a mis-feature. It seemed logical to follow up with this question: what are common non-obvious mistakes in Perl? Things that seem like they ought to work, but don't.

I won't give guidelines as to how to structure answers, or what's "too easy" to be considered a gotcha, since that's what the voting is for.

Table of Answers

Syntax

Semantics/Language Features

Debugging

Best Practices

Meta-Answers

See Also: ASP.NET - Common gotchas

+24  A: 

The fact that single quotes can be used to replace :: in identifiers.

Consider:

use strict;
print "$foo";        #-- Won't compile under use strict
print "$foo's fun!"; #-- Compiles just fine, refers to $foo::s

Leading to the following problem:

use strict;
my $name = "John";
print "$name's name is '$name'";
# prints:
#  name is 'John'

The recommended way to avoid this is to use braces around your variable name:

print "${name}'s name is '$name'";
# John's name is 'John'

And also to use warnings, since it'll tell you about the use of the undefined variable $name::s

Adam Bellaire
This, of course, is a hangover from Perl 4 that was necessary for backwards compatibility.
Jonathan Leffler
You're correct Jonathan, but it's about time there was a pragma to <b>enable</b> this so-called feature. Perl 4 package methods are like Latin. Supporting them in this day and age is more of a gotcha than a benefit IMHO.
RET
This is a feature if you want to have identifiers written in Klingon. ;)I agree, it totally sucks otherwise.
pjf
That's crazy, I've never come across this and use apostrophes pretty often. Learning lots on this question!
Kev
+9  A: 

The most common gotcha is to start your files with anything different than

use strict;
use diagnostics;

pjf adds: Please be warned that diagnostics has a significant impact on performance. It slows program start-up, as it needs to load perldiag.pod, and until bleadperl as of a few weeks ago, it also slows and bloats regexps because it uses $&. Using warnings and running splain on the results is recommended.

Vinko Vrsalovic
s/diagnostics/warnings/
Michael Carman
I like diagnostics better
Vinko Vrsalovic
pjf
+9  A: 

Confusing references and actual objects:

$a = [1,2,3,4];
print $a[0];

(It should be one of $a->[0] (best), $$a[0], @{$a}[0] or @$a[0])

Vinko Vrsalovic
$a->[0] is cleaner (and avoids a needless one element slice)
dland
I would have thought $a->[0]
dsm
Wow, I never fully understood that and I wrote Perl for over a year. I knew -> fixed it, but I never knew why. That makes complete sense!
Abyss Knight
I always forget the best way :-)
Vinko Vrsalovic
This is immediately caught with "use strict;"
runrig
not quite, it complains about a var @a not being imported, which leaves you almost as clueless as before if you aren't aware of the issue. And you could have an @a array declared before, that would get even more confusing. Also, see my other contribution to the thread :)
Vinko Vrsalovic
I write $$a[0], one character shorter than $a->[0].
ephemient
Vinko: It also complains that "Global symbol @a requires explicit package name" which is the usual error. It does not complain about $a since it is auto-magically-imported, but if you say "my $a" then you only get the "Global symbol...@a" error. And people who use $a and @a deserve what they get :-)
runrig
...that should be "people who use $a and @a on purpose in the same scope" deserve what they get :-)
runrig
What's the "one element slice"? Is it slower? Does it apply to $$a[0] (which looks nicer to me)?
Kev
@Kev: IIRC, no, $$a[0] is a straight deference. @$a[0] or @{$a}[0] are one-element slices because they first take the reference and get the whole array out (with @), and the braces [] are then considered to be a slice out of that array which happens to be only one element, rather than an index.
Adam Bellaire
deference == dereference :)
Adam Bellaire
BTW, I honestly don't know whether it's slower. A slice will definitely be slower. I don't know if Perl knows how to optimize single-element slices into index lookups or under what circumstances it might try.
Adam Bellaire
+12  A: 

Assigning arrays to scalars makes no sense to me. For example:

$foo = ( 'a', 'b', 'c' );

Assigns 'c' to $foo and throws the rest of the array away. This one is weirder:

@foo = ( 'a', 'b', 'c' );
$foo = @foo;

This looks like it should do the same thing as the first example, but instead it sets $foo to the length of @foo, so $foo == 3.

Graeme Perrow
In the first example, you have the comma operator in scalar context (no array there), so you end up with the last thing. In the second example you have an array, so you get the behavior you describe.
brian d foy
Another way of saying this is that the in first example you have a List but in second example you have an Array.
Dave Webb
if you put parens around the $foo in the last example $foo becomes 'a'
Brad Gilbert
@brian: Yes, but it's pointless distinctions like this that make Perl so damn confusing and annoying.
j_random_hacker
It's not a pointless distinction, contexts are a fundamental (and useful) part of the language. Are they easy to grasp? No. But they aren't pointless. You yourself gave an example in your answer: Both uses of the 'x' operator are useful. You need to establish context to get what you want.
Adam Bellaire
@Adam: You're right, context can certainly be useful. I guess it's just that often Perl's idea of context works in a way that is contrary to my own intuition. Had I designed Perl, there would be dinstinct scalar and list contexts, but no distinct "array context."
j_random_hacker
+3  A: 

You can't localize exported variables unless you export the entire typeglob.

Michael Carman
+3  A: 

Using the /o modifier with a regex pattern stored in a variable.

m/$pattern/o

Specifying /o is a promise that $pattern won't change. Perl is smart enough to recognize whether or not it changed and recompile the regex conditionally, so there's no good reason to use /o anymore. Alternately, you can use qr// (e.g. if you're obsessed with avoiding the check).

Michael Carman
It is smart enough...not sure as of which version. see:http://www.perlmonks.org/?node_id=256053Don't use /o, use qr//
runrig
Thanks, updated. Someone should submit a doc patch to perlop. (Particularly if they know the version of Perl where the change was made.)
Michael Carman
I should have linked to http://www.perlmonks.org/?node_id=256155 instead so that one doesn't have to search the entire thread.
runrig
+7  A: 
my $x = <>;
do { 
    next if $x !~ /TODO\s*[:-]/;
    ...
} while ( $x );

do is not a loop. You cannot next. It's an instruction to perform a block it's the same thing as

$inc++ while <>;

Despite that it looks like a construction in the C family of languages.

Axeman
Yeah, and the do{{}} workaround is rather ugly.
ephemient
use a bare block with redo `{my $x = <>; ... ; redo if $condition}`
Eric Strom
@Eric Strom: That's pretty cool. I don't know if I've ever used a `redo` in Perl before.
Axeman
@ephemient: What is this double braces woraround that you speak of?
sundar
@sundar `do { {next if $x !~ /TODO\s*[:-]/;} } while ($x);` Here, the `next` applies to the inner `{}`, which form a block, instead of the outer `do {}`, which is invalid.
ephemient
+16  A: 

You can print to a lexical filehandle: good.

print $out "hello, world\n";

You then realise it might be nice to have a hash of filehandles:

my %out;
open $out{ok},   '>', 'ok.txt' or die "Could not open ok.txt for output: $!";
open $out{fail}, '>', 'fail.txt' or die "Could not open fail.txt for output: $!";

So far, so good. Now try to use them, and print to one or the other according to a condition:

my $where = (frobnitz() == 10) ? 'ok' : 'fail';

print $out{$where} "it worked!\n"; # it didn't: compile time error

You have to wrap the hash dereference in a pair of curlies:

print {$out{$where}} "it worked!\n"; # now it did

This is completely non-intuitive behaviour. If you didn't hear about this, or read it in the documentation I doubt you could figure it out on your own.

dland
I've also been bitten by this... Highly non-discoverable as you say.
j_random_hacker
`print $out "text"` can be confused with `print $out, "text"` . `print {$out} "text"` won't.
Brad Gilbert
+4  A: 

This gotcha is fixed in perl 5.10 - if you're lucky enough to be working somewhere that isn't allergic to upgrading things >:-(

I speak of The Variable That's Validly Zero. You know, the one that causes unexpected results in clauses like:

unless ($x) { ... }
$x ||= do { ... };

Perl 5.10 has the //= or defined-or operator.

This is particularly insidious when the valid zero is caused by some edge-condition that wasn't considered in testing before your code went to production...

RET
The pre-5.10 fix for this is the horror that is "0 but true". There is a question on the site about this if someone wants to find more about this.
Dave Webb
"0 but true" doesn't work in all cases (i.e. the empty string). Even when it does work, it only works if you control the data source.
Michael Carman
YES! Everywhere I look I see examples of Perl code that will fail with inputs like "" or 0.
j_random_hacker
+11  A: 

Perl's DWIMmer struggles with << (here-document) notation when using print with lexical filehandles:

# here-doc
print $fh <<EOT;
foo
EOT

# here-doc, no interpolation
print $fh <<'EOT';
foo
EOT

# bitshift, syntax error
# Bareword "EOT" not allowed while "strict subs" in use
print $fh<<EOT;
foo
EOT

# bitshift, fatal error
# Argument "EOT" isn't numeric...
# Can't locate object method "foo" via package "EOT"...
print $fh<<'EOT';
foo
EOT

The solution is to either be careful to include whitespace between the filehandle and the << or to disambiguate the filehandle by wrapping it in {} braces:

print {$fh}<<EOT;
foo
EOT
Michael Carman
+3  A: 

If you're foolish enough to do so Perl will allow you to declare multiple variables with the same name:

my ($x, @x, %x);

Because Perl uses sigils to identify context rather than variable type, this almost guarantees confusion when later code uses the variables, particularly if $x is a reference:

$x[0]
$x{key}
$x->[0]
$x->{key}
@x[0,1]
@x{'foo', 'bar'}
@$x[0,1]
@$x{'foo', 'bar'}
...
Michael Carman
+7  A: 

I did this once:

my $object = new Some::Random::Class->new;

Took me ages to find the error. Indirect method syntax is eeevil.

Dan
+6  A: 

Most of Perl's looping operators (foreach, map, grep) automatically localize $_ but while(<FH>) doesn't. This can lead to strange action-at-a-distance.

Michael Carman
Words of Wisdom: Make sure you "local $_;" before you "while(<FH>)" in anything outside the main script (e.g. in subroutines).
runrig
Using an explicit variable *does* test for definedness, try: perl -MO=Deparse -we'while (my $line=<>) {}'
ysth
@ysth: Thanks, fixed. I could have sworn that used to be true, but it clearly isn't now.
Michael Carman
+10  A: 

The perltrap manpage lists many traps for the unwary organized by type.

Michael Carman
+5  A: 

Constants can be redefined. A simple way to accidentally redefine a constant is to define a constant as a reference.

 use constant FOO => { bar => 1 };
 ...
 my $hash = FOO;
 ...
 $hash->{bar} = 2;

Now FOO is {bar => 2};

If you are using mod_perl (at least in 1.3) the new FOO value will persist until the module is refreshed.

cwhite
Ouch! I haven't used "use constant" myself, but if I need something like it in future I will now look for something more "constant" in nature.
j_random_hacker
If I remember correctly, in C too, a constant pointer only means you can't make it point anywhere else - the actual content it points to can be changed. Seen from that angle, this behaviour is quite reasonable.
sundar
+1  A: 

Misspelling variable names... I once spent an entire afternoon troubleshooting code that wasn't behaving correctly only to find a typo on a variable name, which is not an error in Perl, but rather the declaration of a new variable.

ceretullis
This is a common mistake in many languages. Luckily, using the `strict` pragma helps immensely in ensuring that you're referring to existing variables (which are declared with `my` or `our`) rather than creating new ones.
pjf
Agreed, the error there is not starting your script with "use strict"
David Precious
Yep, always use copy-paste. Never retype.
Peter Mortensen
@David Precious: actually wasn't my script, I was doing maintenance work on a pre-existing script... They depended on NOT having "use strict". If I had to do it again, I would have corrected that first.
ceretullis
+14  A: 

This is a meta-answer. A lot of nasty gotchas are caught by Perl::Critic, which you can install and run from the command line with the perlcritic command, or (if you're happy to send your code across the Internet, and not be able to customise your options) via the Perl::Critic website.

Perl::Critic also provides references to Damian Conways Perl Best Practices book, including page numbers. So if you're too lazy to read the whole book, Perl::Critic can still tell you the bits you should be reading.

pjf
For some value of "should", of course.
jrockway
+5  A: 

What values would you expect @_ to contain in the following scenario?

sub foo { } 

# empty subroutine called in parameters

bar( foo(), "The second parameter." ) ;

I would expect to receive in bar:

undef, "The second parameter."

But @_ contains only the second parameter, at least when testing with perl 5.88.

foo doesn't return anything, even undef, because it has no expressions to evaluate. I guess this is debatable whether it's a bug or "by design"...
Ether
foo() is running in list context, and it returns an empty list, which is collapsed. To see this, observe that "sub foo { return (); }" behaves the same way.
j_random_hacker
+5  A: 
andy
Is it correct to say `my` is distributive? I've heard it and \ are.
Vince
+5  A: 

Use of uninitialized value in concatenation...

This one drives me crazy. You have a print that includes a number of variables, like:

print "$label: $field1, $field2, $field3\n";

And one of the variables is undef. You consider this a bug in your program -- that's why you were using the "strict" pragma. Perhaps your database schema allowed NULL in a field you didn't expect, or you forgot to initialize a variable, etc. But all the error message tells you is that an uninitialized value was encountered during a concat (.) operation. If only it told you the name of the variable that was uninitialized!

Since perl doesn't want to print the variable name in the error message for some reason, you end up tracking it down by setting a breakpoint (to look at which variable is undef), or adding code to check for the condition. Very annoying when it only happens one time out of thousands in a CGI script and you can't recreate it easily.

andy
I don't know if it's a Perl 5.10 thing, but the variable name prints for me ("Use of uninitialized value $blah in concatenation...", "...in pattern match...", etc.)
Kev
Of course, in that particular example, you could look at what's printed and the punctuation will tell you. e.g., "SomeLabel: 1, , 3" would be $field2 that's undef. (I'm assuming no empty strings, though... "" and undef would print the same.)
Dave Sherohman
As Kev pointed out, this error is much clearer in 5.10
mpeters
+3  A: 

Adding extra parentheses could never change the code's meaning, right? Right?

my @x = ( "A"  x 5 );      # @x contains 1 element, "AAAAA"
my @y = (("A") x 5 );      # @y contains 5 elements: "A", "A", "A", "A", "A"

Oh, that's right, this is Perl.

EDIT: Just for good measure, if x is being called in scalar context, then the parentheses don't matter after all:

my $z = ( "A"  x 5 );      # $z contains "AAAAA"
my $w = (("A") x 5 );      # $w contains "AAAAA" too

Intuitive.

j_random_hacker
This is well-documented. I know contexts can be confusing if you're new to Perl, but they are a fundamental part of learning the language. If you don't want to learn, though, nobody is going to make you. (I hope ;)
Adam Bellaire
I'm comfortable with contexts. Why doesn't x decide what context to supply to its left argument based on the context supplied to itself (a la reverse())?. Documented yes, but that doesn't stop it from being a gratuitous inconsistency.
j_random_hacker
My point is: extra non-precedence-adjusting parens around a subexpression should *never* change the semantics of the expression, or programmers become afraid to use parens at all. Most of Perl agrees that $expr is always the same as ((($expr))) -- why did they have to break that useful rule here?
j_random_hacker
I understand what you're saying, but I just don't see it as a gotcha. That is, I don't see anyone putting the parens in by accident and have it do something unexpected. There are lots of design decisions that were made in Perl that are non-intuitive to some people, though. It's subjective.
Adam Bellaire
+1 upvote for adding to the list anyway
Adam Bellaire
Thanks for the vote. I agree it's subjective. I've personally been bitten by it a couple of times, and it makes me edgy when what I think of as a safe assumption about language syntax is violated.
j_random_hacker
BTW I totally agree about there being some non-intuitive design decisions in Perl (passing/returning filehandles to functions used to be so horrible I restructured programs to avoid doing it). But despite my complaints I still use Perl for most things... :)
j_random_hacker
This makes perfect sense to me. In the first, you're repeating a scalar. In the second, you're repeating a list. I think the issue here is more that parens are used to construct lists in addition to their traditional precedence-setting role.
Dave Sherohman
@Dave: Yes. What grates with me is that sometimes the parens create list context, sometimes they don't. E.g. "return (42, 43, 44);" in a function does NOT create list context if the function is called in scalar context.
j_random_hacker
+5  A: 

Graeme Perrow's answer was good, but it gets even better!

Given a typical function that returns a nice list in list context, you might well ask: What will it return in scalar context? (By "typical," I mean the common case in which the documentation doesn't say, and we assume it doesn't use any wantarray funny business. Maybe it's a function you wrote yourself.)

sub f { return ('a', 'b', 'c'); }
sub g { my @x = ('a', 'b', 'c'); return @x; }

my $x = f();           # $x is now 'c'
my $y = g();           # $y is now 3

The context a function is called in is propagated to return statements in that function.

I guess the caller was wrong to want a simple rule of thumb to enable efficient reasoning about code behaviour. You're right, Perl, it's better for the caller's character to grovel through the called function's source code each time.

j_random_hacker
Nice gotcha, especially when writing your own functions. But your advice for dealing with it is probably not the best. Rather than reading someone else's source, don't rely on undocumented behavior. If you need a scalar, get the list first and then get your scalar (length, first, etc.) from that.
Adam Bellaire
Making your code rely on rules of thumb with undocumented features couples your code to their implementation. If their code changes syntax, your code will break. Just stick to the documented behavior to protect your code from this situation. This is known as reducing coupling.
Adam Bellaire
@Adam: I totally agree: one should never let one's code depend on undocumented behaviour. And as you say, getting the full list first is the right way to deal with it. My point is that, had Perl been designed differently, the non-obvious workaround of getting the list first would not be necessary.
j_random_hacker
+3  A: 

Comparing strings using == and != instead of eq and ne. For instance:

$x = "abc";
if ($x == "abc") {
    # do something
}

Instead of:

$x = "abc";
if ($x eq "abc") {
    # do something
}
Nathan Fellman
+1  A: 

Forgetting to prepend the directory path to the results of readdir before doing tests on those results. Here's an example:

#!/usr/bin/env perl
use strict;
use warnings;

opendir my $dh, '/path/to/directory/of/interest'
  or die "Can't open '/path/to/directory/of/interest for reading: [$!]";

my @files = readdir $dh; # Bad in many cases; see below
# my @files = map { "/path/to/directory/of/interest/$_" } readdir $dh;

closedir $dh or die "Can't close /path/to/directory/of/interest: [$!]";

for my $item (@files) {
  print "File: $item\n" if -f $item;
  # Nothing happens. No files? That's odd...
}

# Scratching head...let's see...
use Data::Dumper;
print Dumper @files;
# Whoops, there it is...

This gotcha is mentioned in the documentation for readdir, but I think it's still a pretty common mistake.

Telemachus
+2  A: 

Unary minus with "foo" creates "-foo":

perl -le 'print -"foo" eq "-foo" ? "true" : "false"'

This only works if the first character matches /[_a-zA-Z]/. If the first character is a "-" then it changes the first character to a "+", and if the first character is a "+" then it changes the first character to a "-". If the first character matches /[^-+_a-zA-Z]/ then it attempts to convert the string to a number and negates the result.

perl -le '
    print -"foo";
    print -"-foo";
    print -"+foo";
    print -"\x{e9}"; #e acute is not in the accepted range
    print -"5foo";   #same thing for 5
'

The code above prints

-foo
+foo
-foo
-0
-5

This feature mostly exists to allow people to say things like

my %options = (
    -depth  => 5,
    -width  => 2,
    -height => 3,
);
Chas. Owens
Yikes, I didn't know unary minus applied to a string did anything magic. I wonder what else is hiding away in perlop...
j_random_hacker
A: 

Modifying the array you're looping on in a for(each) as in:

my @array = qw/a b c d e f g h/;

for ( @array ) {
    my $val = shift @array;
    print $val, "\n";
}

it gets confused and doesn't do what you would expect

kemp