tags:

views:

138

answers:

5

I want to retrive the no of functions(only defined functions and not the calling functions) present in a in a text file(count)

The text file{function.txt}is below

#include<main.h>
#include<mncl.h>
int reg23;
int refid23;
int64 AccounntBalance(char *reg12,char *refid,char **id){  //dis is function1
ref();
if(id>100)
  {
  do(&ref);
  }
}                                                        //dis is end of fucntion1
void AccountRetrivalForm(char **regid,char **balance,char **id)   //dis is function2
   {
   doref(); 
   int register;
   if(refid!=null)
    {
   dolog();
     }
   }                                                     //dis is end of function2

Now the program as per my logic is:

 #!C:/strawberry/perl
 use strict; 
 use warnings; 
 my $filename = 'function_perl.txt';
 my $function_count = 0;
 open(FILENAME,$filename);
 my @arr = join("\n",<FILENAME>);
foreach  my $string(@arr)
  {
 if($string =~/(?:int64|void|boolean)\s?(.*?)\(.*?\)\s*\{/)
     {
 print "HAI";
 $function_count++;
 print '$function_count';
  }
  }

Here Function_count is 1.It never increment for the second match....Please help me with the same code...I am trying so long.I find it hard to fix this.

+3  A: 

Your regex is ill-formed.

^([int64/void/boolean)]/(.*)/\(/\{//\n

You probably meant something like:

/^(int64|void|boolean)\s+(\w+)\s*\(.*?\)\s*\{/

That is, one of int64, void, or boolean, some whitespace, an identifier, optional whitespace, an opening parenthesis, some content, a closing parenthesis, some optional whitespace (which might be a newline), and an opening curly brace.

Jon Purdy
+3  A: 

I would like to say that the way you go through the file is unusual. Usually you would use something like

open my $handle, '<', $filename;
while (<$handle>) {
    if (/^(void|boolean|int64).../) {
        do something;
    }
}
close $handle;

What the code does is it opens the file and reads in one line at a time. This is the same as you do by getting the whole array, joining it and iterating over its elements.

An unrelated however important hint for using Perl: include the three lines after the beginning of your script:

#!/usr/bin/perl
use strict;
use warnings;
use autodie qw(:all);

This warns you if there are some unassigned variables you try to use in a concatenation for example and other things. The strict package forces you to declare variables with the my keyword. This sounds like a bit of hassle but it also prevents you from getting problems because you just mistyped a variable. With using strict the Perl interpreter will warn you about an undeclared variable. The autodie pragma checks for failure on system calls such as open.

The regexp you are using is wrong. You have to be aware that regexps in Perl are enclosed by slashes, so /\s+\w*.*/ is a valid regexp. You are using here a slash in your regexp which closes the expression prematurely. If you have to match agains slashes in your text you would have to escape them using a backslash or use a different delimiter altogether.

BTW, you also have a typo: @filecontent vs. @file_content. This is a perfect place where use strict; would have warned you.

GorillaPatch
@daxim Thanks for the edits. Why is it better to use a file handle of the form $handle instead of FILENAME ?
GorillaPatch
http://stackoverflow.com/questions/1479741/why-is-three-argument-open-calls-with-lexical-filehandles-a-perl-best-practice
daxim
Thanks for the tip that you provided daxim
Sreeja
@daxim Thanks alot. Herzlichen Dank!
GorillaPatch
+4  A: 

Perhaps this example will help:

use strict;
use warnings;

my $n;

# Supply the input file name as a command-line argument.
# Perl will open the file and process it line by line.
# No need to hard-code the file name in the program, which
# means the script could be reused.
while (my $line = <>){
    # The regex is applied against $line.
    # It will return the items captured by parentheses.
    # The /x option causes Perl to ignore whitespace
    # in our definition of the regex -- for readability.
    my ($type, $func, $args) = $line =~ /^
        ( int64|void|boolean ) \s+
        ( \w+ )                \s*
        \( (.+?) \)
    /x;

    # Skip the line if our regex failed.
    next unless defined $type;

    # Keep track of N of functions, print output, whatever...
    $n ++;
    print $_, "\n" for '', $type, $func, $args;
}

print "\nN of functions = $n\n";
FM
Nice one! I like it how you directly name the matches of the regexp. Makes the code so much more readable. I learned something today. Thanks! +1 for that.
GorillaPatch
+1  A: 

This will do the trick:

#!/usr/bin/env perl
use strict;
use warnings;
use autodie qw(:all);

my $function_count = 0;
open my $input, '<', 'function.txt';
while (defined(my $line = <$input>)) {
    chomp($line);

    if (my ($func) = $line =~ /^(?:int64|void|boolean)\s?(.*?)\(/) {
        print qq{Found function "$func"\n};
        $function_count++;
    }
}
close $input;
print "$function_count\n";

Revised answer taking into account the function calls:

#!/usr/bin/env perl
use strict;
use warnings;
use autodie qw(:all);

my $document;
{
    local $/ = undef;
    open my $input, '<', 'function.txt';
    $document = <$input>;
    chomp $document;
    close $input;
}

my $function_count = 0;
while (my ($func) = $document =~ /(?:int64|void|boolean)\s?(.*?)\(.*?\)\s*\{/gs)) {
    print qq{Found function "$func"\n};
    $function_count++;
}

print "$function_count\n";
Narthring
No need to chomp the line I think because you are not matching against the end of the line.
GorillaPatch
Very true, just added the chomp as a force of habit.
Narthring
@Narthring Thanks.But the above code returns the count for even the function call.. int64 AccounntBalance(char *reg12,char *refid,char **id);
Sreeja
+3  A: 

Regular expressions are not parsers. It's always better to use a parser if you can.

A simple approach is to lean on the parser in ctags:

#! /usr/bin/perl

use warnings;
use strict;

sub usage { "Usage: $0 source-file\n" }

die usage unless @ARGV == 1;

open my $ctags, "-|", "ctags", "-f", "-", @ARGV
  or die "$0: failed to start ctags\n";

while (<$ctags>) {
  chomp;
  my @fields = split /\t/;
  next unless $fields[-1] eq "f";
  print $fields[0], "\n";
}

Sample run:

$ ./getfuncs prog.cc
AccounntBalance
AccountRetrivalForm

Another approach involves g++'s option -fdump-translation-unit that causes it to dump a representation of the parse tree, and you could dig through it as in the following example.

We begin with the usual front matter:

#! /usr/bin/perl

use warnings;
use strict;

Processing requires the name of the source file and any necessary compiler flags.

sub usage { "Usage: $0 source-file [ cflags ]\n" }

The translation-unit dump has a straightforward format:

@1      namespace_decl   name: @2       srcp: :0      
                         dcls: @3      
@2      identifier_node  strg: ::       lngt: 2       
@3      function_decl    name: @4       mngl: @5       type: @6      
                         srcp: prog.c:12               chan: @7      
                         args: @8       link: extern  
@4      identifier_node  strg: AccountRetrivalForm     lngt: 19      

As you can see, each record begins with an identifier, followed by a type, and then one or more attributes. Regular expressions and a bit of hash twiddling are sufficient to give us a tree to inspect.

sub read_tu {
  my($path) = @_;
  my %node;

  open my $fh, "<", $path or die "$0: open $path: $!";
  my $tu = do { local $/; <$fh> };

  my $attrname = qr/\b\w+(?=:)/;
  my $attr =
    qr/($attrname): \s+ (.+?)      # name-value
       (?= \s+ $attrname | \s*$ )  # terminated by whitespace or EOL
      /xm;

  my $fullnode =
    qr/^(@\d+) \s+ (\S+) \s+  # id and type
        ((?: $attr \s*)+)     # one or more attributes
        \s*$                  # consume entire line
      /xm;

  while ($tu =~ /$fullnode/g) {
    my($id,$type,$attrs) = ($1,$2,$3);

    $node{$id} = { TYPE => $type };
    while ($attrs =~ /$attr \s*/gx) {
      if (exists $node{$id}{$1}) {
        $node{$id}{$1} = [ $node{$id}{$1} ] unless ref $node{$id}{$1};
        push @{ $node{$id}{$1} } => $2;
      }
      else {
        $node{$id}{$1} = $2;
      }
    }
  }

  wantarray ? %node : \%node;
}

In the main program, we feed the code to g++

die usage unless @ARGV >= 1;

my($src,@cflags) = @ARGV;
system("g++", "-c", "-fdump-translation-unit", @cflags, $src) == 0
  or die "$0: g++ failed\n";

my @tu = glob "$src.*.tu";
unless (@tu == 1) {
  die "$0: expected one $src.*.tu file, but found",
      @tu ? ("\n", map("  - $_\n", @tu))
          : " none\n";
}

Assuming all went well, we then pluck out the function definitions given in the specified source file.

my $node = read_tu @tu;

sub isfunc {
  my($n) = @_;
  $n->{TYPE} eq "function_decl"
             &&
  index($n->{srcp}, "$src:") == 0;
}

sub nameof {
  my($n) = @_;
  return "<undefined>" unless exists $n->{name};
  $n->{name} =~ /^@/
    ? $node->{ $n->{name} }{strg}
    : $n->{name};
}

print "$_\n" for sort
                 map nameof($_),
                 grep isfunc($_),
                 values %$node;

Example run:

$ ./getfuncs prog.cc -I.
AccounntBalance
AccountRetrivalForm
Greg Bacon
That's the way to do it. I think that you should put the ctags way in front, though.
Svante
@Svante Good suggestion! Updated.
Greg Bacon
I agree with you. This is clearly a c-based language he's trying to deal with..and it's just too easy to break any regex with some perfectly valid function declarations. Better to use a tool that understands the language.
Mike Ellery