ansaurus

Question

Average of column by hours (rows) using awk

Answer 1

+1 A:

Awk has associative arrays, so you can store averages by hour.

Novikov 2010-10-28 19:20:01

Answer 2

+3 A:

I would set the field delimiter to colon, then aggregate in an associative array for the different keys in the array, and finally compute the averages:

gawk -F: 'NF == 4 { sum[$1] += $4; N[$1]++ } 
          END     { for (key in sum) {
                        avg = sum[key] / N[key];
                        printf "%s %f\n", key, avg;
                    } }' filename | sort

On your test data, this gives:

2010-10-28 12 4.348022
2010-10-28 13 3.514688
2010-10-28 14 7.681355

This should produce the correct answer even if the data is not in time order (say you concatenate two log files out of sequence). Note that gawk will sum '3.123 secs' values numerically. The final sort presents the averages in time sequence; there is no guarantee that the keys will be printed in time sequence.

Jonathan Leffler 2010-10-28 19:27:28

works like a charm. thank you both Jonathan and Novikov. i'll now reverse engineer it and try to understand what all the different parts (esp the arrays) do ... (-;

KM 2010-10-29 13:40:56

ansaurus

tags:

views:

answers:

Average of column by hours (rows) using awk

related questions