



I've need to find the average and standard deviation of a large amount of data in this format. I tried using Excel but there doesn't appear to be an easy way to transpose the columns. What am I missing in Excel or should I just use Perl?

Input file format is:

0 123

0 234

0 456

1 657

1 234

1 543

Want result to group the averages and standard deviations by the values in the first column:

0 AvgOfAllZeros StdDevOfALlZeros

1 AvgOfAllOnes StdDevOfAllOnes

using the Statistics::Descriptive CPAN module, you can get it with this:

use strict;
use warnings;
use Statistics::Descriptive;

my ($file) = @ARGV;

my @zeroes;
my @ones;

# Reading it in
open my $fh, '<', $file or die "unable to open '$file', $!";

while (my $line = <$fh>)
   chomp $line;
   my ($value, $number) = split("\s+", $line);
   if ($value)
      push @ones, $number;
      push @zeroes, $number;
close $fh or warn "Can't close fh! $!";

# Stat processing
$stat_zeroes   = Statistics::Descriptive::Full->new();
$stat_ones     = Statistics::Descriptive::Full->new();


print "0: ", $stat_zeroes->mean(), " ", $stat_zeroes->standard_deviation(), "\n",
      "1: ", $stat_ones->mean(), " ", $stat_zeroes->standard_deviation(), "\n";
Have you tried using the AVERAGEIF function of Excel?

This is easy to do in R. If your data is in a file called foo, then this code will do the trick:

> data <- read.table("foo")
> cbind(avg=with(data, tapply(V2, V1, mean)),
+       stddev=with(data, tapply(V2, V1, sd)))
  avg   stddev
0 271 169.5553
1 478 218.8630
If you dealing with a large set of data then you should consider PDL... the Perl Data Language.

If you do this manually in Excel you can copy the data and then Paste it with Paste Special menu option. There is a Transpose check box there.

If you do this more frequently here is a Perl script. Memory complexity is linear to the size of output, so constant in case of only two rows:


while (<>) {
 my ($x, $y) = split;
 $sum{$x} += $y;
 $sumSq{$x} += $y * $y;

for $i (sort keys %sum) {
 $stdev = sqrt(($sumSq{$i} - $sum{$i} * $sum{$i} / $count{$i}) / ($count{$i} - 1));
 print $i, " ", $sum{$i}/$count{$i}, " ", $stdev, "\n";
