views:

38

answers:

2

I have a csv file into which has crept some ^M dos line ends, and I want to get rid of them, as well as 16 spaces and 3 tabs which follow. Like, I have to merge that line with the next one down. Heres an offending record and a good one as a sample of what I mean:

"Mary had a ^M
                  little lamb", "Nursery Rhyme", 1878
"Mary, Mary quite contrary", "Nursery Rhyme", 1838

I can remove the ^M using sed as you can see, but I cannot work out how to rm the nix line end to join the lines back up.

enter code heresed -e "s/^M$ //g" rhymes.csv > rhymes.csv

UPDATE

Then I read "However, the Microsoft CSV format allows embedded newlines within a double-quoted field. If embedded newlines within fields are a possibility for your data, you should consider using something other than sed to work with the data file." from: http://sed.sourceforge.net/sedfaq4.html

So editing my question to ask Which tool I should be using?

+1  A: 

With help from http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n/1252191#1252191, I made this one:

sed -e ':a;N;$!ba;s/\r\n                \t\t\t/=/' -i rhymes.csv

<16 spaces> <3 tabs>

If you just want to delete the CR, you could use:

cat yourfile | tr -d "\r" | tee yourfile
Lekensteyn
I tried that and it does not work for me, the closest I get is with ctrl-v ctrl-m to generate ^M
Cups
Sorry, updated my post now.
Lekensteyn
tr is neat but it did not join the lines. The sed solution worked - I can then go on and use tr to rm any consecutive spaces throughout the file, thanks a lot.
Cups
tr translate (or delete) characters.If you wanted to delete the LF too, you would use `cat yourfile | tr -d "\n\r" | tee yourfile`
Lekensteyn
A: 
dos2unix  file_name

to convert file, or

dos2unix old_file new_file

to create new file.

Chance
Thanks. It left me with the problem of re-identifying and removing this line end in the middle of a record though.
Cups