tags:

views:

435

answers:

7

I am trying to clean up some data, and I would eventually like to put it in CSV form.

I have used some regular expressions to clean it up, but I'm stuck on one step. I would like to replace all but every third newline (\n) with a comma.

The data looks like this.

field1
field2
field3
field1
field2
field3

etc..

I need it in

field1,field2,field3
field1,field2,field3

Anyone have a simple way to do this using sed or awk. I could write a program and use a loop with a mod counter to erase every 1st and 2nd newline char, but I'd rather do it from the command line if possible.

Thanks.

+5  A: 

With awk:

awk '{n2=n1;n1=n;n=$0;if(NR%3==0){printf"%s,%s,%s\n",n2,n1,n}}' yourData.txt

This script saves the last three lines and print them at every third line. Unfortunately, this works only with files having a multiple of 3 lines.

A more general script is:

awk '{l=l$0;if(NR%3==0){print l;l=""}else{l=l","}}END{if(l!=""){print substr(l,1,length(l)-1)}}' yourData.txt

In this case, the last three lines are concatenated in a single string, with the comma separator inserted whenever the line number is not a multiple of 3. At the end of the file, the string is printed if it is not empty with the trailing comma removed.

mouviciel
+1  A: 

cat file | perl -ne 'chomp(); print $_, !(++$i%3) ? "\n" : ",";'

jj33
+3  A: 

Awk version:

awk '{if (NR%3==0){print $0;}else{printf "%s,", $0;}}'
ashawley
+4  A: 

A Perl solution that's a little shorter and that handles files that don't have a multiple of 3 lines:

perl -pe 's/\n/,/ if(++$i%3&&! eof)' yourData.txt
J. A. Faucett
Good one on the non-mulitple-of-three files. I knew mine didn't handle it but didn't see the solution in 3 minutes I took on this.
jj33
A: 

vim version:

:1,$s/\n\(.*\)\n\(.*\)\n/,\1,\2\r/g
chappar
A: 

Use nawk or /usr/xpg4/bin/awk on Solaris:

awk 'ORS=NR%3?OFS:RS' OFS=, infile
radoulov
A: 

awk '{ORS=NR%3?",":"\n";print}' urdata.txt