tags:

views:

410

answers:

3

I have a horribly formated, tab delimited, "CSV" that I'm trying to clean up.

I would like to quote all the fields; currently only some of them are. I'm trying to go through, tab by tab, and add quotes if necessary.

This RegEx will give me all the tabs.

\t

This RegEx will give me the tabs that do not END with a ".

\t(?!")

How do I get the tabs that do not start with a "?

+2  A: 

Use negative lookbehind: (?<!")\t

Greg Bacon
+2  A: 

Generally for these kinds of problems, if it's a one time occurrence, I will use Excels capabilities or other applications (SSIS? T-SQL?) to produce the desired output.

A general purpose regex will usually run into bizarre exceptions and getting it just right will often take longer and is prone to missed groups your regex didn't catch.

If this is going to happen regularly, try to fix the problem at the source and/or create a special utility program to do it.

hova
I agree that the ideal solution would be a fix at the source of the CSV, unfortunately I can't control it. This process to be automated because user's will be uploading this nasty files and expecting them to be imported magically.
chap
Just remember that the regular expression may seem like the best solution until your users start getting "creative". You run the risk of your regex running fine with no errors, but still resulting in garbled output.
hova
A: 

For one shots like this I usually just write a little program to clean up the data, that way I also can add some validation to make sure it really has converted properly after the run. I have nothing against regex but at least my case the time it takes for me to figure out the regex and to be sure it works takes longer than writing a small program. :)

edit: come to think about it, the main motivator is that it is more fun - for me at least :)

Anders K.