tags:

views:

139

answers:

2

The following code generates a list of average number of clients connected by subnet. Currently I have to pipe it through sort|uniq|grep -v HASH.

Trying to keep it all in perl this doesn't work:

foreach $subnet (keys %{keys %{keys %days}}) {
    print "$subnet\n";
}

Source is this:

foreach $file (@ARGV) {
        open(FH, $file) or warn("Can't open file $file\n");
        if ($file =~ /(2009\d{4})/) {
            $dt = $+;
        }
        %hash = {};
        while(<FH>) {
            @fields = split(/~/);
            $subnet = $fields[0];
            $client = $fields[2];
            $hash{$subnet}{$client}++;
        }
        close(FH);
        $file = "$dt.csv";
        open(FH, ">$file") or die("Can't open $file for output");
        foreach $subnet (sort keys %hash) {
                $tot = keys(%{$hash{$subnet}});
                $days{$dt}{$subnet} = $tot;
                print FH "$subnet,$tot\n";
                push @{$subnet}, $tot;
        }
        close(FH);
    }

    foreach $day (sort keys %days) {
        foreach $subnet (sort keys %{$days{$day}}) {
            $tot = $i = 0 ;
            foreach $amt (@{$subnet}) {
                $i++;
                $tot += $amt;
            }
            print "$subnet," . int($tot/$i) . "\n";
        }
    }

How can I eliminate the need for the sort | uniq process outside of perl? The last foreach gets me the subnet ids which are the 'anonymous' names for the arrays. It generates these multiple times (one for each day that subnet was used).

+1  A: 

but this seemed easier than combining spreadsheets in excel.

Actually, modules like Spreadsheet::ParseExcel make that really easy, in most cases. You still have to deal with rows as if from CSV or the "A1" type addressing, but you don't have to do the export step. And then you can output with Spreadsheet::WriteExcel!

I've used these modules to read a spreadsheet of a few hundred checks, sort and arrange and mung the contents, and write to a new one for delivery to an accountant.


In this part:

foreach $subnet (sort keys %hash) {
        $tot = keys(%{$hash{$subnet}});
        $days{$dt}{$subnet} = $tot;
        print FH "$subnet,$tot\n";
        push @{$subnet}, $tot;
}

$subnet is a string, but you use it in the last statement as an array reference. Since you don't have strictures on, it treats it as a soft reference to a variable with the name the same as the content of $subnet. Which is okay if you really want to, but it's confusing. As for clarifying the last part...

Update I'm guessing this is what you're looking for, where the subnet value is only saved if it hasn't appeared before, even from another day (?):

use List::Util qw(sum); # List::Util was first released with perl 5.007003 (5.7.3, I think)
my %buckets;
foreach my $day (sort keys %days) {
    foreach my $subnet (sort keys %{$days{$day}}) {
        next if exists $buckets{$subnet}; # only gives you this value once, regardless of what day it came in
        my $total = sum @{$subnet}; # no need to reuse a variable
        $buckets{$subnet} = int($total/@{$subnet}; # array in scalar context is number of elements
    }
}

use Data::Dumper qw(Dumper);
print Dumper \%buckets;
Anonymous
Seeing as I'm generating the csv files, I thought it would be easier to eliminate an extra step. Thanks for the links though.
Scott Hoffman
@Anonymous, that is cleaner, but it still gives me 1667 records that after passing through sort and uniq come down to 196. What I would like is a way to access the subnet values once ...
Scott Hoffman
@Anonymous, this is much better, but still gives me the HASH(0xab5954) types of answers. I'm thinking the best thing to do would be to eliminate the anonymous arrays somehow.
Scott Hoffman
A: 

Building on @Anonymous's suggestions, I built a hash of the subnet names to acess the arrays:

..

            push @{$subnet}, $tot;
            $subnets{$subnet}++;
    }
    close(FH);
}

use List::Util qw(sum); # List::Util was first released with perl 5.007003

foreach my $subnet (sort keys %subnets) {
    my $total = sum @{$subnet}; # no need to reuse a variable
    print "$subnet," . int($total/@{$subnet}) . "\n"; # array in scalar context is number of elements
}

Not sure if this is the best solution, but I don't have the duplicates any more.

Scott Hoffman