views:

570

answers:

3

The purpose of the regex search is to determine all template class instances from C++ header files. The class instances can be formarted such as:

CMyClass<int> myClassInstance;

CMyClass2<
int,
int
> myClass2Instacen;

The search is performed by loading the entire file into a string:

open(FILE, $file);
$string = join('',<FILE>);
close(FILE);

And the following regex is used to determine the class instances even if the class instance spans more then one line in the string:

$search_string = "\s*\w[^typename].*<(\s*\w\s*,?\n?)*)>\s*\w+.*";
$string =~ m/$search_string/;

The problem is that the search returns one hit only even though more class instances exist in the files.

Is it possible to get all hits by use of this approach from one of the regex backreferences variables?

+3  A: 

What you require is the \G modifier. It starts the next match of your string after the last match.

Here is the documentation from Perl Doc (SO is having trouble with the link, so you'll have to copy and paste):

http://perldoc.perl.org/perlfaq6.html#What-good-is-'%5cG'-in-a-regular-expression%3f

Gavin Miller
Direct link to section referred to: http://perldoc.perl.org/perlfaq6.html#What-good-is-%27\G%27-in-a-regular-expression%3f
Chas. Owens
Thanks Chas :)
Gavin Miller
+5  A: 

First, if you are going to slurp files, you should use File::Slurp. Then you can do:

my $contents = read_file $file;

read_file will croak on error.

Second, [^typename] does not exclude just the string 'typename' but also any string containing any of those characters. Other than that, it is not obvious to me that the pattern you use will consistently match the things you want it to match, but I can't comment on that right now.

Finally, to get all the matches in the file one by one, use the g modifier in a loop:

my $source = '3 5 7';

while ( $source =~ /([0-9])/g ) {
    print "$1\n";
}

Now that I have had a chance to look at your pattern, I am still not sure of what to make of [^typename], but here is an example program that captures the part between the angle brackets (as that seems to be the only thing you are capturing above):

use strict;
use warnings;

use File::Slurp;

my $pattern = qr{
    ^
    \w+                    
    <\s*((?:\w+(?:,\s*)?)+)\s*> 
    \s*
    \w+\s*;
}mx;

my $source = read_file \*DATA;

while ( $source =~ /$pattern/g ) {
    my $match = $1;
    $match =~ s/\s+/ /g;
    print "$match\n";
}

__DATA__
CMyClass<int> myClassInstance;

CMyClass2<
int,
int
> myClass2Instacen;

C:\Temp> t.pl
int
int, int

Now, I suspect you would prefer the following, however:

my $pattern = qr{
    ^
    (
      \w+                    
      <\s*(?:\w+(?:,\s*)?)+\s*> 
      \s*
      \w+
    )
    \s*;
}mx;

which yields:

C:\Temp> t.pl
CMyClass<int> myClassInstance
CMyClass2< int, int > myClass2Instacen
Sinan Ünür
A: 

I'd do something like this,


#!/usr/bin/perl -w
use strict;
use warnings;

local(*F);
open(F,$ARGV[0]);
my $text = do{local($/);};
my (@hits) = $text =~ m/([a-z]{3})/gsi;

print "@hits\n";
assuming you've got some text file like,
/home/user$ more a.txt
a bb dkl jidij lksj lai suj ldifk kjdfkj bb
bb kdjfkal idjksdj fbb kjd fkjd fbb  kadfjl bbb
bb bb bbd i

this will print out all the hits from the regex:


/home/user$ ./a.pl a.txt
dkl jid lks lai suj ldi kjd fkj kdj fka idj ksd fbb 
kjd fkj fbb kad fjl bbb bbd



and a specific solution for your problem, using the same approach, might look like,


#!/usr/bin/perl -w                                                                                                           
use strict;
use warnings;

my $text = <<ENDTEXT;
 CMyClass<int> myClassInstance;

CMyClass2<
int,
int
> myClass2Instacen;


CMyClass35<
int,
int
    > myClass35Instacen;

ENDTEXT

my $basename = "MyClass";
my (@instances) = $text =~ m/\s*(${basename}[0-9]*\s*\<.*?                                                                
                            (?=\>\s*${basename})                                                                          
                            \>\s*${basename}.*?;)/xgsi;

for(my $i=0; $i<@instances; $i++){
    print $i."\t".$instances[$i]."\n\n";
}

of course you'll probably need to tweak the regex a bit more to fit all the edge cases in your data but that should be a pretty good start.

blackkettle
open my $fh, $ARGV[0]is better than local(*F); open(F,$ARGV[0]);use Perl::Critic on your examples.
Alexandr Ciornii
i tried Perl::Critic on my examples (bit of a hassle to install) but it doesn't give any comments/warnings/errors for my example. also, i noted that the pre and code block are not properly escaping my left-right angle brackets...
blackkettle