tags:

views:

156

answers:

3

In Perl, when you have a nested data structure, it is permissible to omit de-referencing arrows to 2d and more level of nesting. In other words, the following two syntaxes are identical:

my $hash_ref = { 1 => [ 11, 12, 13 ], 3 => [31, 32] };

my $elem1 = $hash_ref->{1}->[1]; 
my $elem2 = $hash_ref->{1}[1]; # exactly the same as above

Now, my question is, is there a good reason to choose one style over the other?

It seems to be a popular bone of stylistic contention (Just on SO, I accidentally bumped into this and this in the space of 5 minutes).

So far, almost none of the usual suspects says anything definitive:

  • perldoc merely says "you are free to omit the pointer dereferencing arrow".
  • Conway's "Perl Best Practices" says "whenever possible, dereference with arrows", but it appears to only apply to the context of dereferencing the main reference, not optional arrows on 2d level of nested data structures.
  • "Mastering Perl for Bioinfirmatics" author James Tisdall doesn't give very solid preference either:

    "The sharp-witted reader may have noticed that we seem to be omitting arrow operators between array subscripts. (After all, these are anonymous arrays of anonymous arrays of anonymous arrays, etc., so shouldn't they be written [$array->[$i]->[$j]->[$k]?) Perl allows this; only the arrow operator between the variable name and the first array subscript is required. It make things easier on the eyes and helps avoid carpal tunnel syndrome. On the other hand, you may prefer to keep the dereferencing arrows in place, to make it clear you are dealing with references. Your choice."

  • UPDATED "Intermediate Perl", as per its co-author brian d foy, recommends omitting the arrows. See brian's full answer below.

Personally, I'm on the side of "always put arrows in, since it's more readable and obvious they're dealing with a reference".

UPDATE To be more specific re: readability, in case of a multi-nested expression where subscripts themselves are expressions, the arrows help to "visually tokenize" the expressions by more obviously separating subscripts from one another.

+10  A: 

Unless you really enjoy typing or excessively long lines, don't use the arrows when you don't need them. Subscripts next to subscripts imply references, so the competent programmer doesn't need extra clues to figure that out.

I disagree that it's more readable to have extra arrows. It's definitely unconventional to have them moving the interesting parts of the term further away from each other.

In Intermediate Perl, where we actually teach references, we tell you to omit the unnecessary arrows.

Also, remember there is no such thing as "readability". There is only what you (and others) have trained your eyes to recognize as patterns. You don't read things character-by-character then figure out what they mean. You see groups of things that you've seen before and recognize them. At the base syntax level that you are talking about, your "readability" is just your ability to recognize patterns. It's easier to recognize patterns the more you use it, so it's not surprising that what you do now is more "readable" to you. New styles seem odd at first, but eventually become more recognizable, and thus more "readable".

The example you give in your comments isn't hard to read because it lacks arrows. It's still hard to read with arrows:

 $expr1->[$sub1{$x}]{$sub2[$y]-33*$x3}{24456+myFunct($abc)}
 $expr1->[$sub1{$x}]->{$sub2[$y]-33*$x3}->{24456+myFunct($abc)}

I write that sort of code like this, using these sorts of variable names to remind the next coder about the sort of container each level is:

my $index = $sub1{$x};
my $key1  = $sub2[$y]-33*$x3;
my $key2  = 24456+myFunct($abc);

$expr1->[ $index ]{ $key1 }{ $key2 };

To make that even better, hide the details in a subroutine (that's what they are there for :) so you never have to play with that mess of a data structure directly. This is more readable that any of them:

  my $value = get_value( $index, $key1, $key2 );

  my $value = get_value(
  $sub1{$x},
   $sub2[$y]-33*$x3,
   24456+myFunct($abc)
      );
brian d foy
Even though brian does not need my vote, but I really like the "interesting parts of the term" point.
Axeman
brian - any input on my point made to Jefromi? E.g. extra spacing between subscripts helps "visually tokenize" them - it matters little when writing $a->[2]{c} but a lot more with more involved expression like "$expr1->[$sub1{$x}]{$sub2[$y]-33*$x3}{24456+myFunct($abc)}" (NOT my actual code!)
DVK
P.S. Totally unrelated - but do you have any answers for this Meta question? http://meta.stackoverflow.com/questions/42797/randal-schwartz-on-so-but-not-in-perl
DVK
I updated the question to reflect your answer. Thanks!
DVK
@DVK: With that kind of nesting, I'd say you get your clarity from spaces inside the brackets, not arrows between them.
Jefromi
@Jefromi - I always put spaces inside brackets where subscript value is more complicated than a single variable - here I was worried about running out of comment space and consciously skipped them... bad me.
DVK
Agreed, though we can't always guarantee that a "competent" programmer will be maintaining our code. While it is a slippery slope towards obtuse and unnecessary verbosity, I can understand why some might feel the need to leave in the "extra clues" from time to time.
Adam Bellaire
Maybe you guys can take this discussion somewhere else. It's way off-topic for a comment to my answer.
brian d foy
OK, I'll accept this answer despite disagreeing with it - mostly becaise of "IP" mention, but I must say that none of the answers really provided a good enough reason to choose one over the other style - mostly the reasons to NOT choose. I suspect there likely isn't a clear-cut enough reason to choose in this case, like in many stylistic disputes :(
DVK
OK, I'll accept this answer with no reservations now - I still somewhat disagree (i think that the second line in the example I gave is MUCH mroe readable - and readable enough not to need the extra options) - BUT I find that the two replacement options promote excellent code style and as such turn this into a great answer!
DVK
A: 

I have always written all of the arrows. I agree with you, they separate better the different subscripts. Plus I use curly braces for regular expressions, so to me {foo}{bar} is a substitution: s{foo}{bar} stands out more from $s->{foo}->{bar} than from $s->{foo}{bar}.

I don't think it's a big thing though, reading code that omits the extra arrows is not a problem (as opposed to any indentation that's not the one I use ;--)

mirod
+1  A: 

Since the -> arrow is non-optionally used for method calls, I prefer to only use it to call code. So I would use the following:

$object->method;
$coderef->();
$$dispatch{name}->();

$$arrayref[1];
$$arrayref[1][5];
@$arrayref[1 .. 5];
@$arrayref;
$$hashref{foo};
$$hashref{foo}{bar};
@$hashref{qw/foo bar/};
%$hashref;

Two sigils back to back always means a dereference, and the structure remains consistent across all forms of dereferencing (scalar, slice, all).

It also keeps all parts of the variable "together" which I find more readable, and it's shorter :)

Eric Strom
But not all dereferences have two sigils back-to-back :)
brian d foy
Exactly, and since there are choices for dereferencing, and not for method calls... "I think visual metaphors are very important. How it looks. Different things should look different. Similar things should look similar." ~ Larry Wall. Calling code is very different from other forms of dereferencing.
Eric Strom
Consider ${$hash{a}}{b} and $$hash{a}{b}, and get the inexperienced Perler to tell you why those are different. It's much easier to make those things look different with the leading arrow. As Larry said, different things should look different.
brian d foy