tags:

views:

105

answers:

4

I am using awk to perform counting the sum of one column in the csv file. The data format is something like:

id, name, value
1, foo, 17
2, bar, 76
3, "I am the, question", 99

I was using this awk script to count the sum:

awk -F, '{sum+=$3} END {print sum}'

Some of the value in name field contains comma and this break my awk script. My question is: can awk solve this problem? If yes, and how can I do that?

Thank you.

A: 

You can always tackle the problem from the source. Put quotes around the name field, just like the field of "I am the, question". This is much easier than spending your time coding workarounds for that.

Update(as Dennis requested). A simple example

$ s='id, "name1,name2", value 1, foo, 17 2, bar, 76 3, "I am the, question", 99'

$ echo $s|awk -F'"' '{ for(i=1;i<=NF;i+=2) print $i}'
id,
, value 1, foo, 17 2, bar, 76 3,
, 99

$ echo $s|awk -F'"' '{ for(i=2;i<=NF;i+=2) print $i}'
name1,name2
I am the, question

As you can see, by setting the delimiter to double quote, the fields that belong to the "quotes" are always on even number. Since OP doesn't have the luxury of modifying the source data, this method will not be appropriate to him.

ghostdog74
Perhaps it would be helpful if you showed how to handle the quoted field.
Dennis Williamson
Thanks, DennisBut the csv file is generated by the client, so can I do nothing about the format of file. :(
maguschen
A: 

you write a function in awk like below:

$ awk 'func isnum(x){return(x==x+0)}BEGIN{print isnum("hello"),isnum("-42")}'
0 1

you can incorporate in your script this function and check whether the third field is numeric or not.if not numeric then go for the 4th field and if the 4th field inturn is not numberic go for 5th ...till you reach a numeric value.probably a loop will help here, and add it to the sum.

Vijay Sarathi
A: 

You're probably better off doing it in perl with Text::CSV, since that's a fast and robust solution.

Daenyth
Yes, I agree on you, I just wonder how awk handle this problem. :)
maguschen
A: 

If you know for sure that the 'value' column is always the last column:

awk -F, '{sum+=$NF} END {print sum}'

NF represents the number of fields, so $NF is the last column

Hai Vu