views:

47

answers:

2

i had made a small bash script in order to get the frequency of items in a certain column of a file.

The output would be sth like this

A     30
B     25
C     20
D     15
E     10

the command i used inside the script is like this

cut -f $1 $2| sort | uniq -c | 
sort -r -k1,1 -n | awk '{printf "%-20s %-15d\n", $2,$1}'

how can i modify it to show the relative percentages for each case as well. so it would be like

A     30     30%
B     25     25%
C     20     20% 
D     15     15%
E     10     10%
+1  A: 

Change your awk command to something like this:

awk '{ a[++n,1] = $2; a[n,2] = $1; t += $1 }
     END {
         for (i = 1; i <= n; i++)
             printf "%-20s %-15d%d%%\n", a[i,1], a[i,2], 100 * a[i,2] / t
     }'
schot
A: 

Try this (with the sort moved to the end:

cut -f $1 $2| sort | uniq -c  | awk '{array[$2]=$1; sum+=$1} END { for (i in array) printf "%-20s %-15d %6.2f%%\n", i, array[i], array[i]/sum*100}' | sort -r -k2,2 -n
Dennis Williamson