tags:

views:

54

answers:

2

I want to find the average rainfall of any three states say CA, TX and AX for a particular month from Jan to Dec . Given input file delimited by TAB SPACES and has the format city name, the state , and then average rainfall amounts from January through December, and then an annual average for all months. EG may look like

AVOCA   PA  30  2.10    2.15    2.55    2.97    3.65    3.98    3.79    3.32     3.31   2.79    3.06    2.51    36.18
BAKERSFIELD CA  30  0.86    1.06    1.04    0.57    0.20    0.10    0.01    0.09    0.17    0.29    0.70    0.63    5.72

What I want to do is "To get the sum of average rainfall for say a particular month feb , over say n years and then find its average for the states CA, TX and AX.

I have written the below script in awk to do the same , but it doesn't give me the expected output

/^CA$/ {CA++; CA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/^TX$/ {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/^AX$/ {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only 
END {
     CA_avg = CA_SUM/CA;
     TX_avg = TX_SUM/TX;
     AX_avg = AX_SUM/AX; 
     printf("CA Rainfall: %5.2f",CA_avg);
     printf("CA Rainfall: %5.2f",TX_avg);
     printf("CA Rainfall: %5.2f",AX_avg);
    }

I invoke the program with the command awk 'FS="\t"'-f awk1.awk rainfall.txt and see no output.

Question: Where am I slipping? Any suggestions and a changed code will be appreciated

+2  A: 

your regexp should be

/ CA / {CA++; cA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/ TX / {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/ AX / {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only 

/^AX$/ match only if it is the only word in the line

HTH!

EDIT

/ CA / {CA++; CA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/ TX / {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/ AX / {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only 
END {

 if(CA!=0){CA_avg = CA_SUM/CA;     printf("CA Rainfall: %5.2f",CA_avg);}
 if(TX!=0){TX_avg = TX_SUM/TX;     printf("TX Rainfall: %5.2f",TX_avg);}
 if(AX!=0){TX_avg = AX_SUM/CA;     printf("AX Rainfall: %5.2f",AX_avg);}
}
belisarius
@belisarius - does not work - I see no output again .
Eternal Learner
@Eternal try remuving your FS from the comand line
belisarius
@belisarius: Gives me a division by zero error
Eternal Learner
@eternal wait ... testing
belisarius
@belisarius: I tried something like this and I got a division by zero error BEGIN { FS = "\t" } ; /\\tCA\\t/ {CA++; cA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only /\\tTX\\t/ {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only /\\tAX\\t/ {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only END { CA_avg = CA_SUM/CA; TX_avg = TX_SUM/TX; AX_avg = AX_SUM/AX; printf("CA Rainfall: %5.2f",CA_avg); printf("CA Rainfall: %5.2f",TX_avg); printf("CA Rainfall: %5.2f",AX_avg); }
Eternal Learner
@Eternal Working now .. don't post code in comments :)
belisarius
@Eternal It's running at http://ideone.com/tcHg1
belisarius
@belisarius : Hey I changed it to something like below and it work sBEGIN { FS = "\t" } ;/ CA / {CA++; CA_SUM+= $5} # CA - Regular Expression to match the word CA only/ TX / {TX++; TX_SUM+= $5} # TX - Regular Expression to match the word TX only/ AK / {AK++; AK_SUM+= $5} # AK - Regular Expression to match the word AX onlyEND { CA_AVG = CA_SUM/CA; TX_AVG = TX_SUM/TX; AK_AVG = AK_SUM/AK; printf("CA Rainfall: %f",CA_AVG); printf("TX Rainfall: %f",TX_AVG); printf("AK Rainfall: %f",AK_AVG); }Thanks for your help
Eternal Learner
+3  A: 

The pattern /^CA$/ means the characters "C" and "A" are the only characters on the line. You want:

$2 == "CA" {CA++; CA_SUM+= $5}
# etc.

However, this is DRYer:

{ count[$2]++; sum[$2] += $5 }
END {
    for (state in count) {
        printf("%s Rainfall: %5.2f\n", state, sum[state]/count[state])
    }
}

Also, this looks wrong: awk 'FS="\t"'-f awk1.awk rainfall.txt
try: awk -F '\t' -f awk1.awk rainfall.txt


Response to comments:

awk -F '\t' -v month=2 -v states="CA,AZ,TX" '
    BEGIN {
        month_col = month + 3  # assume January is month 1
        split(states, wanted_states, /,/)
    }
    { count[$2]++; sum[$2] += $month_col }
    END {
        for (state in wanted_states) {
            if (state in count) {
                printf("%s Rainfall: %5.2f\n", state, sum[state]/count[state])
            else
                print state " Rainfall: no data"
        }
    }
' rainfall.txt
glenn jackman
+1 for a more general solution and mentioning DRY in the context of rain.
schot
+1 Much better than mine. I was thinking only in correcting the OP errors, which begets always a shortsighted answer. You may improve it a bit more by allowing a parameter in the command line for the month number. Just my 2 cents.
belisarius
You could change your DRY version to select particular states: `awk -v statelist="AK CA TX" 'match(statelist,$2){ count[$2]++; sum[$2] += $5 } ...`. Or use a shell variable instead of the literal `states="AK CA TX"; awk -v statelist=$states '...'`
Dennis Williamson