views:

115

answers:

5

My thoughts on how to grab all scalers and arrays out of perl file went along the lines of:

open (InFile, "SomeScript.pl");
@InArray = <InFile>;
@OutArray = {};
close (InFile);
$ArrayCount = @InArray;
open (OutFile, ">outfile.txt");
for ($x=0; $x<=$ArrayCount; $x++){
$Testline = @InArray[$x];

if($Testline =~ m/((@|\$)[A-Z]+)/i){
    $Outline = "$1\n";  
    push @OutArray, $Outline;
}
}
print OutFile @OutArray;
close(OutFile);

...and this works fairly well. The problem is that if multiple variable appear on a line it will only grab the first variable. An example might be:

$FirstVar = $SecondVar + $ThirdVar;

The script would only grab $FirstVar and output to a file. This might still work though because $SecondVar and $ThirdVar have to be initialized somewhere else before the proceeding line has any meaning. I guess the exception to the rule would be a line in which multiple variables are initialized at the same time.

Can anyone think of an example in real perl code that would break this script? Also, can anyone show me how to grab multiple items that match my regular expression's criteria from the same line?

Thanks in advance!

+1  A: 

Time simple-minded answer is to the /g flag on your regexp.

The complex answer is that this sort of code analysis is very difficult for perl. Look at the module PPI for a better, more full featured, semantic analysis of perl code.

Robert Mah
A: 

I can't answer either of your questions directly, but I will offer this: I don't know why you're trying to extract scalars, but the debugger package that comes with perl has to "know" about all variables, and the last time I looked it was written in Perl. You may be better off trying to evaluate a perl script using the debugger package or techniques borrowed from that package rather than reinventing the wheel.

Chris Cleeland
+3  A: 

It looks like this will miss fully qualified variable names ($My::Package::Foo) and the rare but valid variable names enclosed with braces (${variable}, ${"varname!with#special+chars"}). Your script will also match element accesses of hashes and arrays ($array[4] ==> $array, $hash{$key} ==> $hash), and object method calls ($object->method() ==> $object), which may or may not be what you want.

You also mismatch variables with underscores ($my_var) and numbers ($var3), and you could get false positives from comments, quoted strings, pod, etc. (# report bugs to [email protected]).

Matching multiple expressions is a matter of using the /g modifier, which will return a list of matches:

@vars = $Testline =~ /[@\$]\w+/gi;
if (@vars > 0) {
  push @OutArray, @vars;
}
mobrule
Those are things I would have never thought of! Thank you!
Micah
A: 

Despite the limitations with the method, here is a slightly simpler version of the script above that reads from stdin.

#!/usr/bin/perl
use strict;
use warnings;
my %vars;

while (<>) {
  $vars{$_}++ for (m'([$@]\w+)'g);
}

my @vars = keys %vars;
print "@vars\n";
ar
+6  A: 

Don't do that

You can't really parse Perl with regexes, so I wouldn't even try.
You can't even properly parse it without actually running it, but you can get close with PPI.

perl-variables.pl

#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.1;

use PPI;
use PPI::Find;

my($filename) = (@ARGV, $0); # checks itself by default

my $Doc = PPI::Document->new($filename);
my $Find = PPI::Find->new( sub{
  return 0 unless $_[0]->isa('PPI::Token::Symbol');
  return 1;
});

$Find->start($Doc);
while( my $symbol = $Find->match ){
  my $raw = $symbol->content;
  my $var = $symbol->symbol;
  if( $raw eq $var ){
    say $var;
  } else {
    say "$var\t($raw)";
  }
}
print "\n";

my @found = $Find->in($Doc);
my %found;
$found{$_}++ for @found;

say for sort keys %found;

Running it against itself, produces:

$filename
@ARGV
$0
$Doc
$filename
$Find
@_  ($_)
$Find
$Doc
$symbol
$Find
$raw
$symbol
$var
$symbol
$raw
$var
$var
@found
$Find
$Doc
%found
%found  ($found)
$_
@found
%found

$0
$Doc
$Find
$_
$filename
$found
$raw
$symbol
$var
%found
@ARGV
@found
Brad Gilbert