views:

69

answers:

2

I am writing a comparefiles subroutine in Perl that reads a line of text from one file (f1) and then searches for it in another (f2) in the normal O(n^2) way.

sub comparefiles {
    my($f1, $f2) = @_;
    while(<f1>) {
        # reset f2 to the beginning of the file
        while(<f2>) {
        }
    }
}

sub someother {
    open (one, "<one.out");
    open (two, "<two.out");
    &comparefiles(&one, &two);
}

I have two questions

  • How do I pass the file handles to the subroutine? In the above code, I have used them as scalars. Is that the correct way?
  • How do I reset the file pointer f2 to the beginning of the file at the position marked in the comment above?
+9  A: 

First of all start every time your script with :

use strict;
use warnings;

Use lexical filehandle, three args open and test the result :

open my $fh1 , '<' , $filename1 or die "can't open '$filename1' for reading : $!";

Then you can pass the filehandles to the sub :

comparefiles($fh1, $fh2);

To rewind the file use the seek function (perldoc -f seek)

seek $fh, 0, 0;
M42
+2  A: 

If the files are small enough to fit in memory, you might consider storing the lines in a hash, which would prevent the need for O(n^2) searching.

Within the framework of your existing approach, I would advise against nesting your file reading loops -- perhaps on aesthetic grounds if nothing else. Instead, put the inner loop in a subroutine.

use strict;
use warnings;

# Works for 2 or more files.
analyze_files(@ARGV);

sub analyze_files {
    my @file_names = @_;
    my @handles = map { open my $h, '<', $_; $h } @_;
    my $fh = shift @handles;

    while (my $line = <$fh>) {
        my @line_numbers = map { find_in_file($_, $line) } @handles;
        print join("\t", @line_numbers, $line);
    }
}

# Takes a file handle and a line to hunt for.
# Returns line number if the line is found.
sub find_in_file {
    my ($fh, $find_this) = @_;
    seek $fh, 0, 0;
    while (my $line = <$fh>){
        return $. if $line eq $find_this;
    }
    return -1; # Not found.
}
FM
helpful! thanks @FM.
Lazer