views:

259

answers:

5
+3  Q: 

Pimp my Perl code

I'm an experienced developer, but not in Perl. I usually learn Perl to hack a script, then I forget it again until the next time. Hence I'm looking for advice from the pros.

This time around I'm building a series of data analysis scripts. Grossly simplified, the program structure is like this:

01 my $config_var = 999;

03 my $result_var = 0;

05 foreach my $file (@files) {
06   open(my $fh, $file);
07   while (<$fh>) {
08     &analyzeLine($_);
09   }
10 }

12 print "$result_var\n";

14 sub analyzeLine ($) {
15   my $line = shift(@_);
16   $result_var = $result_var + calculatedStuff;
17 }

In real life, there are up to about half a dozen different config_vars and result_vars.

These scripts differ mostly in the values assigned to the config_vars. The main loop will be the same in every case, and analyzeLine() will be mostly the same but could have some small variations.

I can accomplish my purpose by making N copies of this code, with small changes here and there; but that grossly violates all kinds of rules of good design. Ideally, I would like to write a series of scripts containing only a set of config var initializations, followed by

do theCommonStuff;

Note that config_var (and its siblings) must be available to the common code, as must result_var and its lookalikes, upon which analyzeLine() does some calculations.

Should I pack my "common" code into a module? Create a class? Use global variables?

While not exactly code golf, I'm looking for a simple, compact solution that will allow me to DRY and write code only for the differences. I think I would rather not drive the code off a huge table containing all the configs, and certainly not adapt it to use a database.

Looking forward to your suggestions, and thanks!


Update

Since people asked, here's the real analyzeLine:

# Update stats with time and call data in one line.
sub processLine ($) {
  my $line = shift(@_);
  return unless $line =~ m/$log_match/;
  # print "$1 $2\n";
  my ($minute, $function) = ($1, $2);
  $startMinute = $minute if not $startMinute;
  $endMinute = $minute;
  if ($minute eq $currentMinute) {
    $minuteCount = $minuteCount + 1;
  } else {
    if ($minuteCount > $topMinuteCount) {
      $topMinute = $currentMinute;
      $topMinuteCount = $minuteCount;
      printf ("%40s %s : %d\n", '', $topMinute, $topMinuteCount);
    }
    $totalMinutes = $totalMinutes + 1;
    $totalCount = $totalCount + $minuteCount;
    $currentMinute = $minute;
    $minuteCount = 1;
  }
}

Since these variables are largely interdependent, I think a functional solution with separate calculations won't be practical. I apologize for misleading people.

+5  A: 

Go ahead and create a class hierarchy. Your task is an ideal playground for OOP style of programming. Here's an example:

package Common;
sub new{
  my $class=shift;
  my $this=bless{},$class;
  $this->init();
  return $this;
}
sub init{}
sub theCommonStuff(){ 
  my $this=shift;
  for(1..10){ $this->analyzeLine($_); }
}
sub analyzeLine(){
  my($this,$line)=@_;
  $this->{'result'}.=$line;
}

package Special1;
our @ISA=qw/Common/;
sub init{
  my $this=shift;
  $this->{'sep'}=',';   # special param: separator
}
sub analyzeLine(){      # modified logic
  my($this,$line)=@_;
  $this->{'result'}.=$line.$this->{'sep'};
}

package main;
my $c = new Common;
my $s = new Special1;
$c->theCommonStuff;
$s->theCommonStuff;
print $c->{'result'}."\n";
print $s->{'result'}."\n";
catwalk
Sounds and looks good, thank you! I can put these packages in separate files, each named like the package, right?
Carl Smotricz
Yes, you can put them in separate files named like Common.pm, Special1.pm and `use` them from your main program. Just add `1;` at the end of each .pm file.
catwalk
Please do not use and do not recommend indirect object syntax, however superficially appealing it is to be able to write `new Common` rather than `Common->new`.
Sinan Ünür
After all the effort Sinan poured into his answer, it's almost a shame to go with this solution. It does, however, have the appeal of being easier for me to grasp and implement. I will be sure to use the "better" `new` syntax though.
Carl Smotricz
@Carl Smotricz: You asked the question, so don't feel bad about choosing one answer over the other. Just note that @catwalk accesses the internals of the object (violates encapsulation) instead of using getters and setters and the only reason I used `Class::Accessor::Faster` was to avoid having to write accessors ;-)
Sinan Ünür
Oh, of course, allowing the user of the module to specify a sub to do the actual "analysis" per line gives you flexibility and separation between scaffolding and actual work.
Sinan Ünür
OK about the accessors. One reason I'm doing this in Perl is to avoid some of the formality I'd otherwise deal with in Java. I don't mind violating encapsulation in a one-off throwaway solution, I mainly didn't want to create 2 dozen copies of my code. In other words, I'm knowingly taking my code quality from "burn on sight" to "less than a shining example" - a dramatic improvement! :)
Carl Smotricz
For excellent OOP goodness with less boilerplate, check out Moose. http://moose.perl.org/
daotoad
A: 

why not create a function and using $config_var and $result_var as parameters?

Dyno Fu
There are several of each, and I'd have to pass references to the `result_var`s. Also, if the function is in a separate file, how to I access it from the "main" script?
Carl Smotricz
+2  A: 

If all the common code is in one function, a function taking your config variables as parameters, and returning the result variables (either as return values, or as in/out parameters), will do. Otherwise, making a class ("package") is a good idea, too.

sub common_func {
    my ($config, $result) = @_;
    # ...
    $result->{foo} += do_stuff($config->{bar});
    # ...
}

Note in the above that both the config and result are hashes (actually, references thereto). You can use any other data structure that you feel will suit your goal.

Chris Jester-Young
Why the prototype?
Sinan Ünür
I prototype out of habit, but yes, it's optional.
Chris Jester-Young
@Chris Jester-Young See http://www.perlmonks.org/?node_id=406231 and http://stackoverflow.com/questions/1719990/perl-subroutine-call/1720052#1720052
Sinan Ünür
I don't know if the -1 is from you, and if it's over the prototype, but if so, a retraction of the downvote would be most appreciated. (If it's for something else, I'd love to hear about it.) :-)
Chris Jester-Young
@Chris Jester-Young: Nope, it was just the prototype issue. I was just away from SO for a while.
Sinan Ünür
@Sinan: Thanks, much appreciated!
Chris Jester-Young
+10  A: 

Two comments: First, don't post line numbers as they make it more difficult than necessary to copy, paste and edit. Second, don't use &func() to invoke a sub. See perldoc perlsub:

A subroutine may be called using an explicit & prefix. The & is optional in modern Perl, ... Not only does the & form make the argument list optional, it also disables any prototype checking on arguments you do provide.

In short, using & can be surprising unless you know what you are doing and why you are doing it.

Also, don't use prototypes in Perl. They are not the same as prototypes in other languages and, again, can have very surprising effects unless you know what you are doing.

Do not forget to check the return value of system calls such as open. Use autodie with modern perls.

For your specific problem, collect all configuration variables in a hash. Pass that hash to analyzeLine.

#!/usr/bin/perl

use warnings; use strict;
use autodie;

my %config = (
    frobnicate => 'yes',
    machinate  => 'no',
);

my $result;
$result += analyze_file(\%config, $_) for @ARGV;

print "Result = $result\n";

sub analyze_file {
    my ($config, $file) = @_;

    my $result;

    open my $fh, '<', $file;
    while ( my $line = <$fh> ) {
        $result += analyze_line($config, $line);
    }

    close $fh;

    return $result;
}

sub analyze_line {
    my ($line) = @_;
    return length $line;
}

Of course, you will note that $config is being passed all over the place, which means you might want to turn this in to a OO solution:

#!/usr/bin/perl

package My::Analyzer;

use strict; use warnings;

use base 'Class::Accessor::Faster';

__PACKAGE__->follow_best_practice;
__PACKAGE__->mk_accessors( qw( analyzer frobnicate machinate ) );

sub analyze_file {
    my $self = shift;
    my ($file) = @_;

    my $result;

    open my $fh, '<', $file;
    while ( my $line = <$fh> ) {
        $result += $self->analyze_line($line);
    }

    close $fh;

    return $result;
}

sub analyze_line {
    my $self = shift;
    my ($line) = @_;
    return $self->get_analyzer->($line);
}

package main;

use warnings; use strict;
use autodie;

my $x = My::Analyzer->new;

$x->set_analyzer(sub {
        my $length; $length += length $_ for @_; return $length;
});
$x->set_frobnicate('yes');
$x->set_machinate('no');


my $result;
$result += $x->analyze_file($_) for @ARGV;

print "Result = $result\n";
Sinan Ünür
Thanks for the many thoughtful comments! I'd included line numbers to make it easier to talk about the code, as I had expected a more abstract discussion.
Carl Smotricz
Carl Smotricz
I am indeed using `or die` on my `open`, also on `opendir`, thanks. Again, sorry about oversimplifying my example. Nix on the `autodie`, my perl environments range from 5.005 to 5.10.
Carl Smotricz
Your OO solution looks pretty advanced, and may be using features not found in my old Perl; also, I can't readily install new CPAN packages. I'll probably end up bastardizing the suggestions I've gotten here.
Carl Smotricz
Thanks also for your prototype link. Most enlightening, as I had been dying to ask "why?" Now the following quote: `My first thought any time I see code with prototypes is that the person who wrote it doesn't know Perl very well.` provides a perfectly good reason to use prototypes: I really don't know Perl very well, and I intend to advertise the fact! :)
Carl Smotricz
@Carl => there is nothing intrinsically wrong with prototypes, just that they are in no way what prototypes are in other languages. in Perl, they allow you to impose different types of context on the arguments of the function in the calling code (the same way that Perl's built in functions do). It is an advanced feature, and the usual rule of "don't use it until you know why you need it" applies. If you want to check parameters, that should be done inside the sub, when you pull the values out of `@_`.
Eric Strom
+2  A: 

Some thoughts:

  • If there are several $result_vars, I would recommend creating a separate subroutine for calculating each one.
  • If a subroutine relies on information outside that function, it should be passed in as a parameter to that subroutine, rather than relying on global state.
  • Alternatively wrap the whole thing in a class, with $result_var as an attribute of the class.

Practically speaking, there are a couple ways you could implement this:

(1) Have your &analyzeLine function return calculatedStuff, and add it to &result_var in a loop outside the function:

  $result_var = 0;
  foreach my $file (@files) {
      open(my $fh, $file);
          while (<$fh>) {
              $result_var += analyzeLine($_);
          }
      }
  }

  sub analyzeLine ($) {
      my $line = shift(@_);
      return calculatedStuff;
  }

(2) Pass $result_var into analyzeLine explicitly, and return the changed $result_var.

  $result_var = 0;
  foreach my $file (@files) {
      open(my $fh, $file);
          while (<$fh>) {
              $result_var = addLineToResult($result_var, $_);
          }
      }
  }

  sub addLineToResult ($$) {
      my $running_total = shift(@_);
      my $line = shift(@_);
      return $running_total + calculatedStuff;
  }

The important part is that if you separate out functions for each of your several $result_vars, you'll be more readily able to write clean code. Don't worry about optimizing yet. That can come later, when your code has proven itself slow. The improved design will make optimization easier when the time comes.

jcdyer
Several separate subs for the results is not practical, sorry to have given insufficient info. I had considered passing everything via parameters but will probably end up with classes. Thanks for your input!
Carl Smotricz
Ether
@Carl--No worries. That's why there are multiple design patterns. Still might be useful for someone reading this later. @Ether--Thanks. I've fixed that now. I'm no Perl guru, just a fan of clean code.
jcdyer