views:

290

answers:

4

I cannot decide which approach is more (1) idiomatic Perl, (2) efficient, or (3) "clear".

Let me explain by code. First, I can do

sub something {
  ...
  $ref->{size}   = 10;
  $ref->{name}   = "Foo";
  $ref->{volume} = 100;
  push (@references, $ref);
  ...
  return @references;
}

or, I can do

sub something {
  ...
  push (@names, "Foo");
  $sizes{Foo}   =  10;
  $volumes{Foo} = 100;
  ...
  return (\@names, \%sizes, \%volumes);
}

Both do essentially the same thing. The important thing is, I need the array, because I need to keep the order.

I know, there is always more than one way to do something, but still, which one of these two would you prefer?

+4  A: 

I vastly prefer the former. It keeps one "packet" of data (size, name, volume) together and makes for much more readable code.

Thomas
+9  A: 

Instead of thinking in meaningless terms such as something, think and phrase the issue in concrete terms. In this case, you seem to be returning a list of objects that have name, size and volume attributes. When you think of it that way, there is no reason to even consider the second method.

You can think of optimizations later if you run into problems, but even if you do, you would probably gain more from Memoize than by exploding data structures.

One efficiency improvement I will recommend is to return a reference from this subroutine:

sub get_objects {
    my @ret;

    while ( 'some condition' ) {
        #  should I return this one?
        push @ret, {
            name => 'Foo',
            size => 10,
            volume => 100,
        };
    }

    return \@ret;
}
Sinan Ünür
thank you. I though the concrete terms would just complicated quite simple questions. Btw, I want to return a list of HTML pages, that have title, link, html source and keywords.
Karel Bílek
+1  A: 

Keep your related data together. The only reason to create big parallel arrays is because you are forced to.

If you are concerned about speed and memory usage, you can use constant array indexes to access your named fields:

use constant { SIZE => 0, NAME => 1, VOLUME => 2, };

sub something {
  ...

  $ref->[SIZE]   = 10;
  $ref->[NAME]   = "Foo";
  $ref->[VOLUME] = 100;

  push @references, $ref;

  ...
  return @references;
}

I've also added some whitespace to make the code easier to read.

If I have a lot of parameters with validation rules and/or deep data structures, I tend to look to objects to simplify my code by tying the logic about the data, to the data. Of course, OOP exacts a speed penalty, but I have only rarely seen that become a problem.

For quick and dirty OOP, I use Class::Struct, which has many flaws. For situations where I need type checking, I use Moose or Mouse (when memory or startup speed is a big concern).

daotoad
A: 

Both ways might be useful for different problems. If you are always going to access all of the information together, just keep it together. For instance, in your case you want to track the name, title, and size of a web page. You're probably working with all three of those things at the same time, so keep them together as an array of hash references.

Other times, you might break data into different things that you use separately and want to look up independently of the other columns. In those cases, separate hashes might make sense.

brian d foy