tags:

views:

242

answers:

3

I know that in a subroutine in Perl, it's a very good idea to preserve the "default variable" $_ with local before doing anything with it, in case the caller is using it, e.g.:

sub f() {
    local $_;              # Ensure $_ is restored on dynamic scope exit
    while (<$somefile>) {  # Clobbers $_, but that's OK -- it will be restored
        ...
    }
}

Now, often the reason you use $_ in the first place is because you want to use regexes, which may put results in handy "magic" variables like $1, $2 etc. I'd like to preserve those variables too, but I haven't been able to find a way to do that.

All perlvar says is that @+ and @-, which $1 etc. seem to depend on internally, refer to the "last successful submatches in the currently active dynamic scope". But even that seems at odds with my experiments. Empirically, the following code prints "aXaa" as I had hoped:

$_ = 'a';
/(.)/;          # Sets $1 to 'a'
print $1;       # Prints 'a'
{
    local $_;   # Preserve $_
    $_ = 'X';
    /(.)/;      # Sets $1 to 'X'
    print $1;   # Prints 'X'
}
print $_;       # Prints 'a' ('local' restored the earlier value of $_)
print $1;       # Prints 'a', suggesting localising $_ does localise $1 etc. too

But what I find truly surprising is that, in my ActivePerl 5.10.0 at least, commenting out the local line still preserves $1 -- that is, the answer "aXXa" is produced! It appears that the lexical (not dynamic) scope of the brace-enclosed block is somehow preserving the value of $1.

So I find this situation confusing at best and would love to hear a definitive explanation. Mind you, I'd actually settle for a bulletproof way to preserve all regex-related magic variables without having to enumerate them all as in:

local @+, @-, $&, $1, $2, $3, $4, ...

which is clearly a disgusting hack. Until then, I will worry that any regex I touch will clobber something the caller was not expecting to be clobbered.

Thanks!

+3  A: 

I am not sure there is any real reason to be this paranoid about all these variables. I have managed to use Perl for almost ten years without once needing to use an explicit local in this context.

The answer to your specific question is: The number of digit variables is not a given (even though there is a hard memory limit to how many matches you can work with). So, it is not possible to localize all of them at the same time.

Sinan Ünür
@Sinan, not quite. They're all already always local()ized. The questioner is a bit confused by this and perhaps by the intricacies of the two scoping paradigms in perl.
pilcrow
@Sinan: Caller-save doesn't scale to large systems well because the caller has to always be mindful of what all called functions will clobber, all the way down the call graph. So it's easy for a change in a low-level function's implementation to mess with a high-level function. It's fine for 100-line scripts, but if you are writing a big system you have to be able to *not care* about the implementation details of the functions you call.
j_random_hacker
But, see, that is the thing: I have never had to care about the implementation details of the functions I call (well, let's forget about `Win32::OLE` for a moment). Given pilcrow's clarification, it is obvious why.
Sinan Ünür
@Sinan: I guess we're lucky in this case :) In general though, it's a good idea to save and restore in the called function rather than the caller, because if everyone does that then you only have to look at what the current function clobbers to decide what to save -- as opposed to needing to look at the entire call stack. This applies to other global state as well (e.g. current directory, binmode(), $?, $! etc.) unless of course you specifically want the function to alter that state.
j_random_hacker
@j_random_hacker anything other than top level code gets passed the directory rather than operating on the current directory. As for the library writer localizing `$!`: That's just weird. The calling code should only check `$!` after an error.
Sinan Ünür
@Sinan: Sure, $! is a weak example, there's no reason to localise it. But regarding the current directory: sometimes you don't have a choice, because the function may need to run a program that expects the current directory to be somewhere different. (Have you really not experienced this in 10 years of Perl?) In fact, NOT changing directory can be a security issue: http://www.gnu.org/software/findutils/manual/html_mono/find.html#Changing-the-Current-Working-Directory
j_random_hacker
@j_random_hacker I appreciate the cwd example but I think the argument for localizing of a large set of special variables is not strong. Anyway, I forgot to add smileys after the *weird* above ;-)
Sinan Ünür
I would only worry about `local()`izing `$_`
Brad Gilbert
Since ysth has pointed out that Perl effectively auto-localises these variables, the point is moot, but if that were not the case then I can't see how anyone could defend the idea that it's not "worth" preserving these variables. It's for *exactly* the same reason all programmers prefer local (in Perl, lexical) variables to global variables -- because doing so limits changes to global state, which in turn enables us to reason more efficiently (and accurately) about program flow.
j_random_hacker
+8  A: 

Maybe you can suggest a better wording for the documentation. Dynamic scope means everything up to the start of the enclosing block or subroutine, plus everything up to the start of that block or subroutine call, etc. except that any closed blocks are excluded.

Another way to say it: "last successful submatches in the currently active dynamic scope" means there is implicitly a local $x=$x; at the start of each block for each variable.

Most of the mentions of dynamic scope (for instance, http://perldoc.perl.org/perlglossary.html#scope or http://perldoc.perl.org/perlglossary.html#dynamic-scoping) are approaching it from the other way around. They apply if you think of a successful regex as implicitly doing a local $1, etc.

ysth
+1. `$1` and so forth are automagically `local()`ized, and scope is confusing.
pilcrow
Thanks, very helpful! Yes, I was confused by the term "dynamic scope", which (for some reason) I took to exclude/ignore blocks. It's a relief to know that $1 etc. are implicitly localised. (Based on your links, I guess there's nothing explicitly in the Perl docs saying so?)
j_random_hacker
+1  A: 

I think you are worrying too much. The best thing to do is run your match operator, immediately save the values you want into meaningful variables, then let the special variables do whatever they do without worrying about them:

if( $string =~ m/...(a.c).../ ) {
    my $found = $1;
    }

When I want to capture parts of the strings, I most often use the match operator in list context to get a list of the memories back:

my @array = $string =~ m/..../g;
brian d foy
Have to disagree there :) Over the years I've noticed that any coding practice that seems "unlikely" to cause problems (e.g. relying on $1 etc. not changing across calls to other functions), will eventually cause problems *unless there is a guarantee that it can't*, so nowadays I look for guarantees. Happily, in this case ysth was able to show me that Perl does provide such a guarantee.
j_random_hacker
j_random_hacker
So a guy goes into the doctor and says "It hurts when I move my arm like this!". The doctor says "Don't move your arm like that!".My first code snippet isn't fragile at all. It's your change to it that is the problem. I purposedly did the operations in isolation. You don't do what I say: immediately save the result.
brian d foy
"It's [my] change to it that is the problem." That is absolutely correct. My point is that my change is the type of change that happens *all the time* as code is maintained. Your code is correct *today*, but experience has taught that that's not enough -- I want code that is likely to stay correct in the future. That is what I call *robust* code. Again, this is only important if you are maintaining a large system over a long time. I am, and my philosophy has gradually become more and more maintenance-centric as a result.
j_random_hacker
I don't mean to suggest that you can remove any possibility of creating bugs in the future -- if someone is careless enough, they will introduce bugs. But there are ways to reduce that risk, and clamping down on global state changes is one of them. So if there is a no-cost (or low-cost) way to do that, I will use it.
j_random_hacker
Well, do what you like, but there's absolutely nothing that will keep people from changing good working code into bad broken code no matter what you do. In your sense, there is no such thing as robust code and never can be. My code is correct today and is correct tomorrow and next week and next year. Nothing I do will stop anyone from introducing bugs.
brian d foy
Perfectly robust code doens't exist, but there is more robust code and less robust code. "Nothing I do will stop anyone from introducing bugs" -- you're not serious are you? If so then why bother with modularity, separation of concerns, or any other design principles? The whole point is to make *future* code changes (a) easier and (b) less bug-prone.
j_random_hacker
@j_random_hacker: if you want to keep debating just because you are bored, go somewhere else. You've said what you want to say, and I've said what I want to say. We disagree. Saying it more isn't going to make me agree with you.
brian d foy
That last comment strikes me as a bit defensive, sorry if I appeared to attack you, I was just trying to understand where you're coming from. But it's fine if you don't want to elaborate any further, there's more to life than debating (which I think is what we were *both* doing, BTW).
j_random_hacker