views:

87

answers:

4

Is there any way to exclude/delete/replace one field from a csv file with some regexp in notepad++?

I have a csv file with some data like this:

'1','data1','data2','data3','data4','data5','data6','data7','data8','data9',
'data10','data11','data12','data13','data14','data15','data16','data17','data18',
'data19','data20','data21','data22','data23','\'data24 with some commas, 
here and there and some "double quotes", and fullstops.','data25','data26'

The only problem I am facing is with data24 WHERE I encounter \' and then "" and some wild characters like , and .. This is particularly fixed at 24 field. For the purpose of clarity, I have entered a newline here. But the entire text above is in juts one line.

Any ideas on how to solve?

Thanks.

+1  A: 

Not reliably. It is probably easiest to change the file with some tool which knows how to handle CSV (OpenOffice).

If you still want to use a regex, take a look at the negative lookbehind, so that you match a single quote only if it is not preceded by a backslash.

Sjoerd
A: 

If you're not constained to use a regex, the UNIX cut command is made for this sort of task. This simple utility is available in windows platforms too, for example http://unxutils.sourceforge.net.

Edit: Thanks JPro -- you're right, the download link on the referenced SourceForge page doesn't work. They can be downloaded from http://sourceforge.net/projects/unxutils/

Dave M
link does not work
JPro
This breaks when there are commas in any of the fields, (for example field 24 contains commas).
Ken Bloom
A: 

I'm not sure if I understand you correctly. Do you want to remove field number 24?

To get only L fields from left and R fields from right (thus, exclude fields L+1, ..., NF - R - 1, where NF is number of fields) and not to worry about weird characters in fields staying in between you can use following awk command:

awk 'BEGIN {FS=","; L=23; R=2} { for(i=1; i<=L+1; i++) printf($i); for(i=NF-R+1; i<=NF; i++) printf($i); print '\n'}' your_file

As Dave M mentioned you can get tools like cut (and awk) for Windows from here (this particular package contains gawk which should work as well with the same command)

Edit: Yeah, download link at sourceforge seems not to work. You can get awk and cut from here:

awk: http://gnuwin32.sourceforge.net/packages/gawk.htm

cut: http://gnuwin32.sourceforge.net/packages/coreutils.htm

zifot
This breaks when there are commas in any of the fields, (for example field 24 contains commas).
Ken Bloom
Well, as far as I understand the question, commas ARE ACTUALLY USED to seperate fields, only field number 24 has them in their content which is the essence of the problem.
zifot
@zifot, that's my main problem.
JPro
@JPro: Ok then. So is there something wrong with my awk snippet? Or does it has to be done in Notepad++, period?
zifot
A: 

I suggest using something like Ruby's CSV library to read the file in, process it programmatically, and write it out again.

Ken Bloom