views:

153

answers:

4

I have a Perl codebase, and there are a lot of redundant functions and they are spread across many files.

Is there a convenient way to identify those redundant functions in the codebase? Is there any simple tool that can verify my codebase for this?

+3  A: 

It may not be convenient, but the best tool for this is your brain. Go through all the code and get an understanding of its interrelationships. Try to see the common patterns. Then, refactor!

I've tagged your question with "refactoring". You may find some interesting material on this site filed under that subject.

Ether
I like the stock-room style refactor. 1. Check code into git. 2. Make sure you have lots of tests. 3. Rename the module in entirety to something unusuable. 4. Create an empty one. 5. Copy functions back, refactoring them as you go, until tests pass again. 6. After a month of not needing to copy functions from the old module, remove it.
Kent Fredric
@Ether: your are right about refactoring...but first i need to locate the functions..:)
someguy
A: 

If you are on Linux you might use grep to help you make list all of the functions in your codebase. You will probably need to do what Ether suggests and really go through the code to understand it if you haven't already.

Here's an over-simplified example:

grep -r "sub " codebase/* > function_list

You can look for duplicates this way too. This idea may be less effective if you are using Perl's OOP capability.

It might also be worth mentioning NaturalDocs, a code documentation tool. This will help you going forward.

Chris Kloberdanz
If you're working with Perl, consider using `ack`, a pure-Perl version of `grep` that takes advantage of Perl's more powerful regex support.
Chris Lutz
+8  A: 

You could use the B::Xref module to generate cross-reference reports.

David Harris
I was looking for something like this...
someguy
+7  A: 

I've run into this problem myself in the past. I've slapped together a quick little program that uses PPI to find subroutines. It normalizes the code a bit (whitespace normalized, comments removed) and reports any duplicates. Works reasonably well. PPI does all the heavy lifting.

You could make the normalization a little smarter by normalizing all variable names in each routine to $a, $b, $c and maybe doing something similar for strings. Depends on how aggressive you want to be.

#!perl

use strict;
use warnings;

use PPI;

my %Seen;

for my $file (@ARGV) {
    my $doc = PPI::Document->new($file);
    $doc->prune("PPI::Token::Comment");         # strip comments

    my $subs = $doc->find('PPI::Statement::Sub');
    for my $sub (@$subs) {
        my $code = $sub->block;
        $code =~ s/\s+/ /;                      # normalize whitespace
        next if $code =~ /^{\s*}$/;             # ignore empty routines

        if( $Seen{$code} ) {
            printf "%s in $file is a duplicate of $Seen{$code}\n", $sub->name;
        }
        else {
            $Seen{$code} = sprintf "%s in $file", $sub->name;
        }
    }
}
Schwern
Nice! But why do you need to ignore empty routines?
innaM
Schwern