ansaurus

Question

Search for occurrences of contents of a file in another file

Answer 1

+1 A:

One obvious improvement is to open all the category files first in a separate loop and cache the words in them into a hash of arrays (hash key being the filename), or just one big array if you don't care which search word came from which file.

This will avoid having to re-read the search files for every line in every $file - AND help get rid of duplicate search words in the bargain.

use File::Slurp;
open my $output, ">>D:/output.txt";

my %categories = ();
my @files = <"folder1/*">;
my @categories = <"folder2/*">;
foreach my $categoryName (@categories) {
    my @lines = read_file($categoryName);
    foreach my $category (@lines) {
        chomp($category);
        $categories{$category} = 0;
    }
}
# add in some code to uniquify @categories

foreach my $file (@files) {
    open my $fileh, $file || die "Can't open file $companyName";
    foreach my $line (<$fileh>) {
        foreach my $category (@categories) {
            # count
        }
    }
    # output
}

Also, if these are real "words" - meaning a category of "cat" needs to match "cat dog" but not "mcat" - I would count the word usage by splitting instead of a regex:

foreach my $line (<$fileh>) {
    my @words = split(/\s+/, $line);
    foreach my $word (@words) {
        $categories{$word}++ if exists $categories{$word};
    }
}

DVK 2010-09-08 14:50:54

ansaurus

tags:

views:

answers:

Search for occurrences of contents of a file in another file

related questions