tags:

views:

62

answers:

3

I have a transaction log file in CSV format that I want use to run statistics. The log has the following fields:

date:  Time/date stamp
salesperson:  The username of the person who closed the sale
promo:  sum total of items in the sale that were promotions.
amount:  grand total of the sale

I'd like to get the following statistics:

salesperson:  The username of the salesperson being analyzed.
minAmount:  The smallest grand total of this salesperson's transaction.
avgAmount:  The mean grand total..
maxAmount:  The largest grand total..
minPromo:  The smallest promo amount by the salesperson.
avgPromo:  The mean promo amount...

I'm tempted to build a database structure, import this file, write SQL, and pull out the stats. I don't need anything more from this data than these stats. Is there an easier way? I'm hoping some bash script could make this easy.

+2  A: 

Could also bang out an awk script to do it. It's only CSV with a few variables.

qor72
Awk's a natural for this - even has begin/end so you can initialize and then calcualte averages easily
Martin Beckett
very interesting. How can I use awk to do a "GROUP BY salesperson"?
User1
@User1: With an associative array or `asort()` or `asorti()`.
Dennis Williamson
Done. This was way easier than building a database. It was definitely worth beefing up my awk skills!
User1
+1  A: 

You can loop through the lines in the CSV and use bash script variables to hold your min/max amounts. For the average, just keep a running total and then divide by the total number of lines (not counting a possible header).

Here are some useful snippets for working with CSV files in bash.

If your data might be quoted (e.g. because a field contains a comma), processing with bash, sed, etc. becomes much more complex.

Eric J.
+2  A: 

TxtSushi does this:

tssql -table trans transactions.csv \
'select
    salesperson,
    min(as_real(amount)) as minAmount,
    avg(as_real(amount)) as avgAmount,
    max(as_real(amount)) as maxAmount,
    min(as_real(promo)) as minPromo,
    avg(as_real(promo)) as avgPromo
from trans
group by salesperson'

I have a bunch of example scripts showing how to use it.

Edit: fixed syntax

Keith
+1 This looks much easier. I'll have to give it a shot next time. BTW: Are you an aardvark?
User1
no sir, I'm a sugar glider
Keith
That's what I was afraid of..I hope your owner doesn't get busted for having you in his home (if you're in the US). I heard you guys were "especially intelligent" but, wow, answering on SO is quite over the top. So, is it the sugar that makes you so smart?
User1