ansaurus

Question

How to perform calculation over a log file

Answer 1

+6 A:

Use bash and awk:

cat file | sed -ne 's:^.*INFO.*\[$[0-9, ]*$\][ \r]*$:\1:p' | awk -F ' *, *' '{ sum2 += $2 ; sum3 += $3 } END { if (NR>0) printf "avg2=%.2f, avg3=%.2f\n", sum2/NR, sum3/NR }'

Sample output (for your original data):

avg2=2859.59, avg3=149.94

Of course, you do not need to use cat, it is included there for legibility and to illustrate the fact that input data can come from any pipe; if you have to operate on an existing file, run sed -ne '...' file | ... directly.

EDIT

If you have access to gawk (GNU awk), you can eliminate the need for sed as follows:

cat file | gawk '{ if(match($0, /.*INFO.*\[([0-9, ]*)\][ \r]*$/, a)) { cnt++; split(a[1], b, / *, */); sum2+=b[2]; sum3+=b[3] } } END { if (cnt>0) printf "avg2=%.2f, avg3=%.2f\n", sum2/cnt, sum3/cnt }'

Same remarks re. cat apply.

A bit of explanation:

sed only prints out lines (-n ... :p combination) that match the regular expression (lines containing INFO followed by any combination of digits, spaces and commas between square brackets at the end of the line, allowing for trailing spaces and CR); if any such line matches, only keep what's between the square brackets (\1, corresponding to what's between $...$ in the regular expression) before printing (:p)
- sed will output lines that look like: 8541, 931, 0, 0
awk uses a comma surrounded by 0 or more spaces (-F ' *, *') as field delimiters; $1 corresponds to the first column (e.g. 8541), $2 to the second etc. Missing columns count as value 0
- at the end, awk divides the accumulators sum2 etc by the number of records processed, NR
gawk does everything in one shot; it will first test whether each line matches the same regular expression passed in the previous example to sed (except that unlike sed, awk does not require a \ in fron the round brackets delimiting areas or interest). If the line matches, what's between the round brackets ends up in a[1], which we then split using the same separator (a comma surrounded by any number of spaces) and use that to accumulate. I introduced cnt instead of continuing to use NR because the number of records processed NR may be larger than the actual number of relevant records (cnt) if not all lines are of the form INFO ... [...comma-separated-numbers...], which was not the case with sed|awk since sed guaranteed that all lines passed on to awk were relevant.

Cheers, V.

vladr 2009-03-04 23:16:29

Awesome! Thanks for the explainations as well!

Julien Genestoux 2009-03-05 00:20:52

Answer 2

+1 A:

Posting the reply I pasted to you over IM here too, just because it makes me try StackOverflow out :)

# replace $2 with the column you want to avg; 
awk '{ print $2 }' | perl -ne 'END{ printf "%.2f\n", $total/$n }; chomp; $total+= $_; $n++' < log

Yann 2009-03-05 00:36:05

Answer 3

A:

Use nawk or /usr/xpg4/bin/awk on Solaris.

awk -F'[],]' 'END { 
  print s/NR, t/ct 
  }  
{ 
  s += $(NF-3) 
  if ($(NF-1)) {
    t += $(NF-2)
    ct++
    }
  }' infile

radoulov 2009-03-05 11:41:17

Answer 4

A:

Use Python

logfile= open( "somelogfile.log", "r" )
sum2, count2= 0, 0
sum3, count3= 0, 0
for line in logfile:
    # find right-most brackets
    _, bracket, fieldtext = line.rpartition('[')
    datatext, bracket, _ = fieldtext.partition(']')
    # split fields and convert to integers
    data = map( int, datatext.split(',') )
    # compute sums and counts
    sum2 += data[1]
    count2 += 1
    if data[3] != 0:
        sum3 += data[2]
        count3 += 1
logfile.close()

print sum2, count2, float(sum2)/count2
print sum3, count3, float(sum3)/count3

S.Lott 2009-03-05 11:49:09

ansaurus

tags:

views:

answers:

How to perform calculation over a log file

related questions