I have this awk script that runs through a file and counts every occurrence of a given date. The date format in the original file is the standard date format, like this:
Thu Mar 5 16:46:15 EST 2009I use awk to throw away the weekday, time, and timezone, and then do my counting by pumping the dates into an associative array with the dates as indices.
In order to get the output to be sorted by date, I converted the dates to a different format that I could sort with bash sort.
Now, my output looks like this:
Date Count
03/05/2009 2
03/06/2009 1
05/13/2009 7
05/22/2009 14
05/23/2009 7
05/25/2009 7
05/29/2009 11
06/02/2009 12
06/03/2009 16
I'd really like the output to have more human readable dates, like this:
Mar 5, 2009
Mar 6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun 2, 2009
Jun 3, 2009
Any suggestions for a way I could do this? If I could do this on the fly when I output the count values that would be best.
UPDATE: Here's my solution incorporating ghostdog74's example code:
grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
{++total[$0]} #pump dates into associative array
END {
for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
}' | sort -t \t | awk ' #send to sort, then to cleanup
BEGIN {printf "%s\t%s\r\n","Date","Count"}
{t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
}'
rm dates.txt
Sorry this looks so messy. I've tried to put clarifying comments in.