views:

177

answers:

7

I hope this is something straightforward that I'm doing wrong. I saw something online about "variable suicide" that looked good, but it was for an older version and I'm on 5.10.1.

Anyway - a variable that I declared - $RootDirectory - just suddenly loses its value, and I can't figure out why.

Here's a script to reproduce the problem. When I run through the script in debug mode (perl -d) I can get it to print out the $RootDirectory at line 21 and 26. But it's gone by line 30.

use strict;
my $RootDirectory; 
my @RootDirectories; 

@RootDirectories = (
   'c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\'
   ,'c:\\P4\\EDW\\PRODEDW\\EDWADS\\main\\db\\'
   ,'c:\\P4\\EDW\\PRODEDW\\FJE\\main\\db\\'
   );

foreach $RootDirectory (@RootDirectories) { 
   # $RootDirectory = 'c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\';
   # print ' In foreach ' . $RootDirectory. "\n";
   RunSchema ();
} 

exit(0);

sub RunSchema() { 
   # print ' In RunSchema ' . $RootDirectory. "\n";
   CreateTables ();
} 

sub CreateTables() { 
   # print ' In CreateTables ' . $RootDirectory. "\n";
   SQLExecFolder ('tbl');
} 

sub SQLExecFolder() { 
   print ' In SQLExecFolder ' . $RootDirectory. "\n";       # Variable $RootDirectory value is gone by now
} 

EDIT Thanks for all the comments! I think for now I'll use the "our" keyword which appears to work well - thanks Nathan. Also thanks toolic about the Use Warnings - I think I'm sold on that one!

The thing that continues to confuse me is why, when I did debug mode (perl -d), and stepped through the code, doing "p $RootDirectory" I got the expected output at lines 21 and 26, but not line 30. How is the situation different at line 30?

Also, I appreciate the comments about best practice being to pass $RootDirectory as a function parameter. I wanted to avoid that because I have so many functions following that - i.e. RunSchema calls CreateTables which calls SQLExecFolder. All of them would have to have the same parameter passed. Does it still make sense in this case, or are there any better ways to structure this?

+4  A: 

You're declaring $RootDirectory as the loop variable in a foreach loop. As far as I understand, that means that its value is localized to the loop, and its value is restored to its previous value at the end of the loop.

In your case the variable was never assigned, so at the end of the loop it returns to its previous value of undef.

Edit: Actually, the problem is that $RootDirectory is declared with my, so it is undefined in other scopes. In the functions RunSchema, CreateTables and SQLExecFolder the variable is undefined, regardless of the localization of the foreach.

If you want the variable to be declared for strictness, but want it to be global, declare $RootDirectory with our:

our $RootDirectory;

Edit: That being said, it's not always a good idea to use a global variable. You're better off passing the variable as a parameter to the functions as others have suggested.

Nathan Fellman
And `$RoodDirectory` should be passed as function parameter
Ivan Nevostruev
The functions are in the same lexical scope as the original `$RootDirectory`. The problem is that `foreach` creates a **new** lexical variable when the loop variable is lexical. See http://perldoc.perl.org/perlsub.html#Private-Variables-via-my()
cjm
@Nathan Fellman: your Edit is incorrect - "my" variable at the top of file is visible in all functions inside the file. Using "our" is not necessary here.
Arkadiy
@Arkadiy - the "our" appeared to fix the problem. Maybe I'm missing something?
Sylvia
Avoid the temptation to use global variables.
daotoad
I assume that it fixes the problem because foreach treats our differently than my and does not localize it (although it's not obvious from the docs). My point was that your problem is caused by special treatment of "my" by the for loop, not by "my" as such.
Arkadiy
+3  A: 

foreach variable is special - it's local to the loop.

If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible only within the loop. Otherwise, the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global one, but it's still localized to the loop. This implicit localisation occurs only in a foreach loop.

Please take a look here

Arkadiy
+2  A: 

The iterator variable in foreach loop is always localized to the loop. See the foreach section in perlsyn. You can pass it to a subroutine as a parameter.

eugene y
+7  A: 

What Nathan said is correct. That aside, why don't you pass in the value? It's better practice anyway:

foreach $RootDirectory (@RootDirectories) { 
   # $RootDirectory = 'c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\';
   # print ' In foreach ' . $RootDirectory. "\n";
   RunSchema ($RootDirectory);
} 

sub SQLExecFolder { 
   my $RootDirectory = shift;
   print ' In SQLExecFolder ' . $RootDirectory. "\n";
} 
Vivin Paliath
A: 

Not a bad effort. Here are a couple of small improvements, and one "fix" which is to pass the variable to the subroutines, as a function parameter because the $RootDirectory variable is scoped (i.e. restricted) to within the foreach loop. In general it is also considered good practice in order to make explicit what variables are being passed and/or accessed by various subroutines.

use strict;
use warnings;

sub RunSchema() {
   my $root_dir = shift;
   CreateTables($root_dir);
}

sub CreateTables() {
   my $root_dir = shift;
   SQLExecFolder('tbl', $root_dir);
}

sub SQLExecFolder() {
   my ($name, $root_dir) = @_;
}
######################################################


my @RootDirectories = qw(
   c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\
   c:\\P4\\EDW\\PRODEDW\\EDWADS\\main\\db\\
   c:\\P4\\EDW\\PRODEDW\\FJE\\main\\db\\
);

foreach my $RootDirectory (@RootDirectories) {
   # print ' In foreach ' . $RootDirectory. "\n";
   RunSchema($RootDirectory);
}

exit(0);
mctylr
No, no no. Using function prototypes is useful when you know what you're doing, but it's not or "defining parameters for the subroutines".
mpeters
There is no need to define a variable in Perl before it is first used.
Ether
I've made the suggested changes, which I both agree with. I've also improved my wording for usage of a parameter being passed to the subroutine.
mctylr
+4  A: 

Others have answered your question correctly. I just want to emphasize that you should add use warnings; to your code. It would have given a clue to your problem, and it would alert you to another potential hazard.

toolic
+2  A: 

RE: When to use a global variable?

Global variables are risky because they can be changed at any time by any part of the code that accesses them. In addition, it is difficult to track when and where a change occurs, which makes it harder to track down unintentional consequences from modification. In short, each global variable increases coupling between the subroutines that use it.

When does it make sense to use a global? When the benefits outweigh the risks.

If you have many different values needed by most or all of your subroutines, it seems like a good time to use global variables. You can simplify every subroutine invocation, and make the code clearer, right?

WRONG. In this case the right approach is to aggregate all those distinct variables in one container data structure. So instead of foo( $frob, $grizzle, $cheese, $omg, $wtf ); you have foo( $state, $frob ); Where $state = { grizzle => $grizzle, cheese => $cheese, omg => $omg, wtf => $wtf };.

So now we have one variable to pass around. All those sub calls are much simpler. Yet, even so, this is onerous and you still want to clean up the extra argument from each routine.

At this point you have several options:

  1. Make $state global and just access it directly.
  2. Make $state into a configuration object and use methods to control access to attributes.
  3. Make the whole module into a class, and store all the state information in an object.

Option 1 is acceptable for small scripts with few routines. The risk of hard to debug errors is small.

Option 2 makes sense when there is no obvious relationship between the different routines in the module. Using a global state object helps because it decreases coupling between code that accesses it. It is also easier to add logging to track changes to the global data.

Option 3 works well if you have a group of closely related functions that operate on the same data.

Your sample code seems like a good candidate for option 3. I created a class called MySchema and all the methods that operate on a specific directory are now methods. The invoking object carries the data it needs with it.

Now we have nice, clean code and no globals.

use strict;
use warnings;

my @directories = (
   'c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\',
   'c:\\P4\\EDW\\PRODEDW\\EDWADS\\main\\db\\',
   'c:\\P4\\EDW\\PRODEDW\\FJE\\main\\db\\',
);

for my $schema ( make_schemata(@directories) ) {

    $schema->run;

}

sub make_schemata {
    my @schemata = map { MySchema->new( directory => $_ } @_;

    return @schemata;
}


BEGIN {
    package MySchema;

    use Moose;

    has 'directory' => (
        is => 'ro',
        isa => 'Str',
        required => 1,
    );

    sub run { 
       my $self = shift;

       $self->create_tables;
    } 

    sub create_tables { 
       my $self = shift;

       $self->sql_exec_folder('tbl');
    }

    sub sql_exec_folder {
        my $self = shift;

        my $dir = $self->directory;

        print "In SQLExecFolder $dir\n";
    }

    1;
} 

As a bonus, the the code in the BEGIN block can be removed and placed in a separate file for reuse by another script. All it needs to be a full-fledged module is its own file named MySchema.pm.

daotoad