ansaurus

Question

Finding it hard to fix this.Please guide by checking my edited code

Answer 1

+3 A:

Your regex is ill-formed.

^([int64/void/boolean)]/(.*)/\(/\{//\n

You probably meant something like:

/^(int64|void|boolean)\s+(\w+)\s*\(.*?\)\s*\{/

That is, one of int64, void, or boolean, some whitespace, an identifier, optional whitespace, an opening parenthesis, some content, a closing parenthesis, some optional whitespace (which might be a newline), and an opening curly brace.

Jon Purdy 2010-08-10 13:31:01

Answer 2

+3 A:

I would like to say that the way you go through the file is unusual. Usually you would use something like

open my $handle, '<', $filename;
while (<$handle>) {
    if (/^(void|boolean|int64).../) {
        do something;
    }
}
close $handle;

What the code does is it opens the file and reads in one line at a time. This is the same as you do by getting the whole array, joining it and iterating over its elements.

An unrelated however important hint for using Perl: include the three lines after the beginning of your script:

#!/usr/bin/perl
use strict;
use warnings;
use autodie qw(:all);

This warns you if there are some unassigned variables you try to use in a concatenation for example and other things. The strict package forces you to declare variables with the my keyword. This sounds like a bit of hassle but it also prevents you from getting problems because you just mistyped a variable. With using strict the Perl interpreter will warn you about an undeclared variable. The autodie pragma checks for failure on system calls such as open.

The regexp you are using is wrong. You have to be aware that regexps in Perl are enclosed by slashes, so /\s+\w*.*/ is a valid regexp. You are using here a slash in your regexp which closes the expression prematurely. If you have to match agains slashes in your text you would have to escape them using a backslash or use a different delimiter altogether.

BTW, you also have a typo: @filecontent vs. @file_content. This is a perfect place where use strict; would have warned you.

GorillaPatch 2010-08-10 13:43:36

@daxim Thanks for the edits. Why is it better to use a file handle of the form $handle instead of FILENAME ?

GorillaPatch 2010-08-10 13:53:21

http://stackoverflow.com/questions/1479741/why-is-three-argument-open-calls-with-lexical-filehandles-a-perl-best-practice

daxim 2010-08-10 16:20:18

Thanks for the tip that you provided daxim

Sreeja 2010-08-10 16:59:04

@daxim Thanks alot. Herzlichen Dank!

GorillaPatch 2010-08-10 17:53:19

Answer 3

+4 A:

Perhaps this example will help:

use strict;
use warnings;

my $n;

# Supply the input file name as a command-line argument.
# Perl will open the file and process it line by line.
# No need to hard-code the file name in the program, which
# means the script could be reused.
while (my $line = <>){
    # The regex is applied against $line.
    # It will return the items captured by parentheses.
    # The /x option causes Perl to ignore whitespace
    # in our definition of the regex -- for readability.
    my ($type, $func, $args) = $line =~ /^
        ( int64|void|boolean ) \s+
        ( \w+ )                \s*
        \( (.+?) \)
    /x;

    # Skip the line if our regex failed.
    next unless defined $type;

    # Keep track of N of functions, print output, whatever...
    $n ++;
    print $_, "\n" for '', $type, $func, $args;
}

print "\nN of functions = $n\n";

FM 2010-08-10 13:45:56

Nice one! I like it how you directly name the matches of the regexp. Makes the code so much more readable. I learned something today. Thanks! +1 for that.

GorillaPatch 2010-08-10 13:49:05

Answer 4

+1 A:

This will do the trick:

#!/usr/bin/env perl
use strict;
use warnings;
use autodie qw(:all);

my $function_count = 0;
open my $input, '<', 'function.txt';
while (defined(my $line = <$input>)) {
    chomp($line);

    if (my ($func) = $line =~ /^(?:int64|void|boolean)\s?(.*?)\(/) {
        print qq{Found function "$func"\n};
        $function_count++;
    }
}
close $input;
print "$function_count\n";

Revised answer taking into account the function calls:

#!/usr/bin/env perl
use strict;
use warnings;
use autodie qw(:all);

my $document;
{
    local $/ = undef;
    open my $input, '<', 'function.txt';
    $document = <$input>;
    chomp $document;
    close $input;
}

my $function_count = 0;
while (my ($func) = $document =~ /(?:int64|void|boolean)\s?(.*?)\(.*?\)\s*\{/gs)) {
    print qq{Found function "$func"\n};
    $function_count++;
}

print "$function_count\n";

Narthring 2010-08-10 13:48:23

No need to chomp the line I think because you are not matching against the end of the line.

GorillaPatch 2010-08-10 13:50:46

Very true, just added the chomp as a force of habit.

Narthring 2010-08-10 14:45:30

@Narthring Thanks.But the above code returns the count for even the function call.. int64 AccounntBalance(char *reg12,char *refid,char **id);

Sreeja 2010-08-10 17:43:22

Answer 5

+3 A:

Regular expressions are not parsers. It's always better to use a parser if you can.

A simple approach is to lean on the parser in ctags:

#! /usr/bin/perl

use warnings;
use strict;

sub usage { "Usage: $0 source-file\n" }

die usage unless @ARGV == 1;

open my $ctags, "-|", "ctags", "-f", "-", @ARGV
  or die "$0: failed to start ctags\n";

while (<$ctags>) {
  chomp;
  my @fields = split /\t/;
  next unless $fields[-1] eq "f";
  print $fields[0], "\n";
}

Sample run:

$ ./getfuncs prog.cc
AccounntBalance
AccountRetrivalForm

Another approach involves g++'s option -fdump-translation-unit that causes it to dump a representation of the parse tree, and you could dig through it as in the following example.

We begin with the usual front matter:

#! /usr/bin/perl

use warnings;
use strict;

Processing requires the name of the source file and any necessary compiler flags.

sub usage { "Usage: $0 source-file [ cflags ]\n" }

The translation-unit dump has a straightforward format:

@1      namespace_decl   name: @2       srcp: :0      
                         dcls: @3      
@2      identifier_node  strg: ::       lngt: 2       
@3      function_decl    name: @4       mngl: @5       type: @6      
                         srcp: prog.c:12               chan: @7      
                         args: @8       link: extern  
@4      identifier_node  strg: AccountRetrivalForm     lngt: 19

As you can see, each record begins with an identifier, followed by a type, and then one or more attributes. Regular expressions and a bit of hash twiddling are sufficient to give us a tree to inspect.

sub read_tu {
  my($path) = @_;
  my %node;

  open my $fh, "<", $path or die "$0: open $path: $!";
  my $tu = do { local $/; <$fh> };

  my $attrname = qr/\b\w+(?=:)/;
  my $attr =
    qr/($attrname): \s+ (.+?)      # name-value
       (?= \s+ $attrname | \s*$ )  # terminated by whitespace or EOL
      /xm;

  my $fullnode =
    qr/^(@\d+) \s+ (\S+) \s+  # id and type
        ((?: $attr \s*)+)     # one or more attributes
        \s*$                  # consume entire line
      /xm;

  while ($tu =~ /$fullnode/g) {
    my($id,$type,$attrs) = ($1,$2,$3);

    $node{$id} = { TYPE => $type };
    while ($attrs =~ /$attr \s*/gx) {
      if (exists $node{$id}{$1}) {
        $node{$id}{$1} = [ $node{$id}{$1} ] unless ref $node{$id}{$1};
        push @{ $node{$id}{$1} } => $2;
      }
      else {
        $node{$id}{$1} = $2;
      }
    }
  }

  wantarray ? %node : \%node;
}

In the main program, we feed the code to g++

die usage unless @ARGV >= 1;

my($src,@cflags) = @ARGV;
system("g++", "-c", "-fdump-translation-unit", @cflags, $src) == 0
  or die "$0: g++ failed\n";

my @tu = glob "$src.*.tu";
unless (@tu == 1) {
  die "$0: expected one $src.*.tu file, but found",
      @tu ? ("\n", map("  - $_\n", @tu))
          : " none\n";
}

Assuming all went well, we then pluck out the function definitions given in the specified source file.

my $node = read_tu @tu;

sub isfunc {
  my($n) = @_;
  $n->{TYPE} eq "function_decl"
             &&
  index($n->{srcp}, "$src:") == 0;
}

sub nameof {
  my($n) = @_;
  return "<undefined>" unless exists $n->{name};
  $n->{name} =~ /^@/
    ? $node->{ $n->{name} }{strg}
    : $n->{name};
}

print "$_\n" for sort
                 map nameof($_),
                 grep isfunc($_),
                 values %$node;

Example run:

$ ./getfuncs prog.cc -I.
AccounntBalance
AccountRetrivalForm

Greg Bacon 2010-08-10 15:36:41

That's the way to do it. I think that you should put the ctags way in front, though.

Svante 2010-08-10 16:31:32

@Svante Good suggestion! Updated.

Greg Bacon 2010-08-10 16:56:40

I agree with you. This is clearly a c-based language he's trying to deal with..and it's just too easy to break any regex with some perfectly valid function declarations. Better to use a tool that understands the language.

Mike Ellery 2010-08-11 18:27:59

ansaurus

tags:

views:

answers:

Finding it hard to fix this.Please guide by checking my edited code

related questions