views:

48

answers:

1

This is an extension to a related question answered Here

I have a weekly csv file which needs to be parsed. it looks like this.

"asdf","asdf","asdf","asdf"

But sometimes there are text fields which contain an extra unescaped double quote string like this

"asdf","as "something" df","asdf","asdf"

From the other posts on here, I was able to put together a regex

(?m)""(?![ \t]*(,|$))

which matches two successive double quotes, only "if they DON'T have a comma or end-of-the-line ahead of them with optionally spaces and tabs in between"

now this finds only double quotes in succession. How do i modify it to find and replace/delete the double quotes around "something" in the file?

thanks.

+2  A: 
(?<!^|,)"(?!,|$)

will match a double quote that is not preceded or followed by a comma nor situated at start/end of line.

If you need to allow whitespace around the commas or at start/end-of-line, and if your regex flavor (which you didn't specify) allows arbitrary-length lookbehind (.NET does, for example), you can use

(?<!^\s*|,\s*)"(?!\s*,|\s*$)
Tim Pietzcker
wow, Thanks a bunch, Tim. It's .NET flavor. I'm currently using a text find/replace function within an automation program called winautomation.however, using both your regex in a find and replace returns a replacement of the first double quote and the last double quote of every line. it does seem to find and replace the unescaped double quotes within each text field.so using a "asdf","as "something" df","asdf" as an example, find and replace with ^, i get ^asdf","as ^something^ df","asdf^how do i remedy the first and last "?
stevenjmyu
You need to set the option to allow `^` and `$` to match start and end of lines (instead of the entire input). In .NET, that option is called `RegexOptions.Multiline`. I don't know if you can pass that option to winautomation.
Tim Pietzcker