tags:

views:

1140

answers:

4

On a Linux server that I work with, a process writes randomly-named files at random intervals. Here's a small sample, showing the file size, modification date & time, and file name:

27659   2009-03-09  17:24  APP14452.log
0       2009-03-09  17:24  vim14436.log
20      2009-03-09  17:24  jgU14406.log
15078   2009-03-10  08:06  ySh14450.log
20      2009-03-10  08:06  VhJ14404.log
9044    2009-03-10  15:14  EqQ14296.log
8877    2009-03-10  19:38  Ugp14294.log
8898    2009-03-11  18:21  yzJ14292.log
55629   2009-03-11  18:30  ZjX14448.log
20      2009-03-11  18:31  GwI14402.log
25955   2009-03-12  19:19  lRx14290.log
14989   2009-03-12  19:25  oFw14446.log
20      2009-03-12  19:28  clg14400.log

(Note that sometimes the file size can be zero.)

What I would like is a bash script to sum the size of the files, broken down by date, producing output something like this (assuming my arithmetic is correct):

27679 2009-03-09
33019 2009-03-10
64527 2009-03-11
40964 2009-03-12

The results would show activity trends over time, and highlight the exceptionally busy days.

In SQL, the operation would be a cinch:

SELECT SUM(filesize), filedate
FROM files
GROUP BY filedate;

Now, this is all probably pretty easy in Perl or Python, but I'd really prefer a bash shell or awk solution. It seems especially tricky to me to group the files by date in bash (especially if you can't assume a particular date format). Summing the sizes could be done in a loop I suppose, but is there an easier, more elegant, approach?

+4  A: 

I often use this idiom of Awk:

awk '{sum[$2]+= $1;}END{for (date in sum){print sum[date], date;}}'
ashawley
That's beautiful. I didn't realize awk supported dictionaries so simply.
yukondude
A: 

Following the suggestions from ashawley and vartec, the following "one-liner" does the trick superbly:

ls -l --time-style=long-iso *log |
    awk '{sum[$6]+= $5;}END{for (s in sum){print sum[s], s;}}' |
    sort -k2 |
    column -t
yukondude
A: 

Consider that on Linux you probably have GNU awk, so you don't need other commands:

ls -l --time-style=long-iso * | 
  WHINY_USERS=-9 awk 'END {
    for (s in sum)
      printf "%-15s\t%s\n", sum[s], s
      }
  { sum[$6]+= $5 }
  '
radoulov
+1  A: 

(find ... | xargs stat "--printf=%s+"; echo 0) | bc

dobrokot