ansaurus

Question

Parsing comma separated lines and calculating sum

Answer 1

A:

The way you explained your problem is not very clear. Based on my understanding, this would be my approach:

Create a 2d array containing the different comma delimited fields maintaining the row, column structure.
Analyze each column and create a hash mapping the each data value to the rows containing it.

IE: For column one you'd have a hash
3=5002 0,1
3=5001 2,3
3=5003 4,5

Then you go through each entry of the hash and sum the last member of the rows listed for the different data.
Repeat for each column excluding the last.

dborba 2009-06-24 19:18:42

Answer 2

+1 A:

How about splitting on ",". Then you can pull off the last element and pair it with each element from the list. For your first line you would end up with the following pairs:

3=5002, 200
0=10002, 200
5=1, 200
4=1, 200
7=1, 200
8=1, 200
9=0, 200
1=14002, 200
6=5, 200

Add each one of those pairs to a master list. Once you get that you can sort by the first element in the pair and sum.

mamboking 2009-06-24 19:20:45

Add a reference to Text::CSV_XS, and this is the perfect response, IMO.

Tanktalus 2009-06-24 19:35:38

@Tanktalus There is no need for Text::CSV_XS or Text::XSV in this case.

Sinan Ünür 2009-06-24 19:42:16

Answer 3

+2 A:

Based on my understanding, here are two possible methods. The first one uses composite keys to store values in a single level hash. The second one uses a multi-level hash:

Method 1:

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw( sum );

my %data;

while ( my $line = <DATA> ) {
    chomp $line;

    my @parts = split /, /, $line;
    last unless @parts;

    my $value = pop @parts;

    push @{ $data{$_} }, $value for @parts;
}

for my $col ( sort keys %data ) {
    printf("%12s:%9d\n", $col, sum @{ $data{$col} } );
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

C:\Temp> hj
  3=5001:    11000
  3=5002:      500
  3=5003:     6000
 0=10001:     1000
 0=10002:      500
 0=10004:    16000
 1=14001:    11000
 1=14002:      500
 1=14003:     6000
     4=0:     6000
     4=1:    11500
     5=0:     1300
     5=1:    10200
     5=2:     5000
     5=3:     1000
     6=3:    11000
     6=5:      500
     6=8:     6000
     7=0:     1300
     7=1:      200
     7=2:    16000
     8=0:     1300
     8=1:     1200
     8=2:    15000
     9=0:     2200
     9=1:    15300

Method: 2

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw( sum );

my %data;

while ( my $line = <DATA> ) {
    chomp $line;

    my @parts = split /, /, $line;
    last unless @parts;

    my $value = $parts[-1];

    for ( my $i = 0 ; $i < @parts - 2; ++$i ) {
        my @subparts = split /=/, $parts[$i];
        push @{ $data{$subparts[0]}->{$subparts[1]} }, $value;
    }
}

for my $k1 ( keys %data ) {
    for my $k2 ( keys %{ $data{$k1} } ) {
        printf(
            "%2d:%6d:%9d \n",
            $k1, $k2, sum @{ $data{$k1}->{$k2} }
        );
    }
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

C:\Temp> hjk
 3:  5003:     6000
 3:  5002:      500
 3:  5001:    11000
 7:     1:      200
 7:     0:     1300
 7:     2:    16000
 9:     1:    15300
 9:     0:     2200
 8:     1:     1200
 8:     0:     1300
 8:     2:    15000
 4:     1:    11500
 4:     0:     6000
 1: 14001:    11000
 1: 14003:     6000
 1: 14002:      500
 0: 10001:     1000
 0: 10004:    16000
 0: 10002:      500
 5:     1:    10200
 5:     3:     1000
 5:     0:     1300
 5:     2:     5000

NB: Add sort to taste.

Sinan Ünür 2009-06-24 19:22:06

Just wondering why you are using the shebang `#!/usr/bin/perl` under Windows, Sinan.

Alan Haggai Alavi 2009-06-24 19:31:22

@Alan because Windows does not pay attention to the interpreter path but I can still specify flags on the shebang line. By keeping it a relatively standard `/usr/bin/perl`, I can run the same script in cmd.exe using ActiveState or Strawberry perl or Cygwin perl in the Bash shell or in ArchLinux on those occasions when I dual boot. Finally, I can upload and test the script to any other *nix machine I have access to without having to make changes.

Sinan Ünür 2009-06-24 19:41:01

@Sinan, I see. Nice idea. Using Arch Linux too. :-)

Alan Haggai Alavi 2009-06-25 02:39:10

@Alan I must say, I am very fond of Arch. Everything just works. No fuss.

Sinan Ünür 2009-06-25 02:46:35

Answer 4

A:

I hope this is what you are looking for:

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV_XS;

my %data;
my $csv = Text::CSV_XS->new();
while ( <DATA> ) {
    $csv->parse($_);
    my @fields = $csv->fields();
    $fields[0] =~ s/^3=//;
    $data{ $fields[0] } += $fields[9];
}

use Data::Dumper;
print Dumper \%data;

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

Alan Haggai Alavi 2009-06-24 19:29:55

Answer 5

A:

Well, it seems that everybody is trying to understand what you really want. I don't understand it, but it seems that you only want to capture the sum of all lines that contain a given key=value pair. Except, you don't actually care for the key.

Or something like that.

So, my question would be: can you provide the expected output for the example set of data?

Anyways, here's my try (the '#/' comments are just to help the syntax highlighter.)

#!/usr/bin/perl
use strict;
use warnings;
my %h;
my @ord_keys;
while (<DATA>) {
    chomp;
    my @cols = split /,\s*/; #/
    my $val = pop @cols;

    foreach my $k (@cols) {
        if (exists($h{$k})) {
            $h{$k} += $val;
        } else {
            push @ord_keys, $k;
            $h{$k} = $val;
        }
    }
}

foreach my $key (@ord_keys) {
    my ($k, $v) = split /=/, $key; #/
    print "$k = $v, sum = $h{$key}\n";
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

And the results:

3 = 5002, sum = 500
0 = 10002, sum = 500
5 = 1, sum = 10200
4 = 1, sum = 11500
7 = 1, sum = 200
8 = 1, sum = 1200
9 = 0, sum = 2200
1 = 14002, sum = 500
6 = 5, sum = 500
5 = 0, sum = 1300
7 = 0, sum = 1300
8 = 0, sum = 1300
9 = 1, sum = 15300
3 = 5001, sum = 11000
0 = 10001, sum = 1000
4 = 0, sum = 6000
1 = 14001, sum = 11000
6 = 3, sum = 11000
0 = 10004, sum = 16000
7 = 2, sum = 16000
8 = 2, sum = 15000
3 = 5003, sum = 6000
5 = 2, sum = 5000
1 = 14003, sum = 6000
6 = 8, sum = 6000
5 = 3, sum = 1000

Comments welcome.

Leonardo Herrera 2009-06-24 20:53:06

ansaurus

tags:

views:

answers:

Parsing comma separated lines and calculating sum

related questions