views:

327

answers:

4

I have a text file like this:

Apple
Orange
Orange
Banana
Banana
Orange
Banana
Orange
Apple
Orange

I want to produce the following output after running a bash shell script:

Apple: 2
Orange: 5
Banana: 3

It's pretty standard stuff if I use a full blown language like Java/C++ etc but what is the quickest way to do it with a shell script/command line?

+3  A: 

sort filename | uniq -c | awk '{ print $2 ": " $1 }'

rangalo
No need to cat!
Jefromi
agreed, no need to cat
rangalo
The additional awk will format it as required
rangalo
I am super-sorry about the -1 I apparently gave you - just a misclick; I immediately tried to fix it and it tells me it's too old.
Jefromi
no probs, I gave you +1 for the correction you suggested
rangalo
+10  A: 
sort $FILE | uniq -c

will give you

2 Apple
3 Banana
5 Orange
Jefromi
and to reformat, you can use perl as NawaMan said, or sed: `... | sed -r 's/ *([0-9])+ *(.*)/\2: \1/'` (the `-r` switches it to extended regex, and the substitution is the same as NawaMan's without the unnecessary brackets).
Jefromi
agree with this as best, because it's likey that the user is flexible on output format. Requirements are often agreed upon after a dialog has started.
ericslaw
+2  A: 
uniq -c $FILE | perl -pe 's|[ ]*([0-9]+)[ ]*(.*)|\2: \1|'

This will format it to the way to specified. You can add '| sort' at the end the sort it too.

EDIT: As points out in the comment, I make a mistake about uniq so here is the corrected one.

sort $FILE | uniq -c | perl -pe 's|[ ]*([0-9]+)[ ]*(.*)|\2: \1|'

Sorry for the problem.

NawaMan
`uniq` checks for consecutive identical lines. You must sort the list first.
Jefromi
Thanks for pointing that out. I mostly used to already sorted data so I forgot about that.
NawaMan
+2  A: 

This solution uses only one tool: awk

$ awk '{count[$0]++} END {for (c in count) {print c ": " count[c]}} ' count.txt
Orange: 5
Banana: 3
Apple: 2
Hai Vu