tags:

views:

205

answers:

4

Lets say I have a csv file like this:

a,b1,12,
a,b1,42,
d,e1,12,
r,12,33,

I want to use grep to return only only the rows where the third column = 12. So it would return:

a,b1,12,
d,e1,12,

but not:

r,12,33,

Any ideas for a regular expression that will allow me to do this?

+4  A: 
grep "^[^,]\+,[^,]\+,12," file.csv
Vivin Paliath
In case the line might contain more than three items, I would start the regex with `"^[^,]\+,...` to ensure that the third item is `12` (otherwise, it would incorrectly match a line like "a,b1,61,12").
bta
@bta that's good point. I will update my solution.
Vivin Paliath
added a trailing comma to the regex so it doesn't match `123`
glenn jackman
Good catch glenn!
Vivin Paliath
+1  A: 

Here's a variation:

egrep "^([^,]+,){2}12," file.csv

The advantage is that you can select the field simply by changing the number enclosed in curly braces without having to add or subtract literal copies of the pattern manually.

Dennis Williamson
As vivin's answer, need a trailing comma to match only `12`
glenn jackman
+2  A: 

I'd jump straight to awk to test the value exactly

awk -F, '$3 == 12' file.csv

This, and any regexp-based solution, assumes that the values of the first two fields do not contain commas

glenn jackman
I was thinking that as well. Better tool for the job if you ask me.
Steve
I really need to learn `awk`! I went from `tr` to `sed` to perl without touching `awk`
Vivin Paliath
A: 

when you have csv files, where you have distinct delimiters such as commas, use the splitting on field/delimiters approach, not regular expression. Tools to break strings up like awk, Perl/Python does the job easily for you (Perl/Python has support for csv modules for more complex csv parsing)

Perl,

$ perl -F/,/ -alne  'print if $F[2]==12;' file
a,b1,12,
d,e1,12,

$ awk -F"," '$3==12' file
a,b1,12,
d,e1,12,

or with just the shell

while IFS="," read a b c d
do
    case "$c" in
        12) echo "$a,$b,$c,$d"
    esac
done <"file"
ghostdog74