views:

2672

answers:

5

I have a CSV file I need to clean up. This is a one-time thing so I'd like to do it in Notepad++ if possible.

The CSV file has two fields, one of which is wrapped in quotes. I'd like to remove any Carriage Returns from within the quoted field. I was trying to use this pattern but can't get it quite right...

(.*)\"(.*)\n(.*)\"(.*)

Also correct me if I am wrong, but I presume the "replace with" value would be something along the lines of:

\1\2\3\4

Thanks in advance.

I'm also open to alternate solutions such as a quick and dirty PERL script.

A: 

The problem with Notepad++ is that it doesnt allow you to run a regex with special characters like \n or \t. Regex has to be pure regex and for special characters there is the extended search mode.

I would suggest you run your regex on Eclipse if possible.

If thats not possible, here's a quick and dirty solution

Replace all \n with a special character say # or & which will not occur in your file otherwise.

Next run your regex to look for this special character within quotes and delete it

Finally replace the remaining special characters with \n

Shreyas
Barry
A: 

Click the π button to show the hidden chars. Then select the one carriage return char. Press Ctr+H and replace with an empty string

Hope it works for you.

Jonathan
I only want to replace carriage returns within quotes, this would replace all.
Barry
A: 

Well... my RegEx is terrible and I can't answer your question. However, here is a nice little JS function that you can use that should be able to do what you want.

function removeNewLines(str){
    var quotedStrings = str.split(/["'](.*)?["']/g),
        i = 0;

    for( ; i < quotedStrings.length; i++){
        str = str.replace(quotedStrings[i], quotedStrings[i].replace(/[\r\n]/g,""));
    }
    return str;
}
removeNewLines("\"asdf\r\nas\"asdf\'as\nd\'asdf\"asdf\r\nasf\r\n\"") === "\"asdfas\"asdf'asd'asdf\"asdfasf\"";
illvm
+2  A: 

After much hunting around StackOverflow I found a regex pattern suggested for a similar problem that I only had to modify slightly from single quote to double quote. I ran it in PERL. It worked great! Unfortunately I can't find the original post to give credit where credit is due.

Anyway, this is what I finally used. Thanks everyone for your help!

$string123 =~ s/((?:^[^"]*"|(?!^))[^"]*?(?:"[^"]*"[^"]*?)*?)(\n{1,})/$1/g;
Barry
A: 

Here is an answer specifically for Notepad++

Menu: TextFX > TextFX Edit > Delete Blank Lines

Within that menu item there is also a Delete Surplus Blank Lines option.

Haydar