tags:

views:

351

answers:

3

Can anyone explain how to use sed to delete all characters up to & including the 2nd comma on a line in a CSV file?

The beginning of a typical line might look like

1234567890,ABC/DEF, and the number of digits in the first column varies i.e. there might be 9 or 10 or 11 separate digits in random order, and the letters in the second column could also be random. This randomness and varying length makes it impossible to use any explicit pattern searching.

+3  A: 

You could do it with sed like this

sed -e 's/^\([^,]*,\)\{2\}//'

not 100% sure on the syntax, I tried it, and it seems to work though. It'll delete zero-or-more of anything-but-a-comma followed by a comma, and all that is matched twice in succession.

But even easier would be to use cut, like this

cut -d, -f3-

which will use comma as a delimiter, and print fields 3 and up.

EDIT:
Just for the record, both sed and cut can work with a file as a parameter, just append it at the end like so

cut -d, -f3- myfile.txt

or you can pipe the output of your program through them

./myprogram | cut -d, -f3-
roe
If you're using GNU `sed` you can do `sed -r 's/^([^,]*,){2}//'` which is a little easier on the eyes.
Dennis Williamson
@Dennis Williamson: yes, posix regex don't really seem to ever have been intended for human eyes, except when looking for fixed strings... :)
roe
A: 

sed is not the "right" choice of tool (although it can be done). since you have structured data, you can use fields/delimiter method instead of creating complicated regex.

you can use cut

$ cut -f3- -d"," file

or gawk

$ gawk -F"," '{$1=$2=""}1'  file
$ gawk -F"," '{for(i=3;i<NF;i++) printf "%s,",$i; print $NF}'  file
ghostdog74
A: 

Thanks for all replies - with the help provided I have written the simple executable script below which does what I want.

!/bin/bash

cut -d, -f3- ~/Documents/forex_convert/input.csv |

sed -e '1d' \

-e 's/-/,/g' \

-e 's/ /,/g' \

-e 's/:/,/g' \

-e 's/,D//g' > ~/Documents/forex_convert/converted_input

exit

Andrew