views:

464

answers:

7

How can I replace all line-endings in big file (>100MB)? I have tried to do

:%s/\n/, /g

but it's too slow.

+2  A: 

Use this Perl script to go through your file; it'd be faster than holding everything in memory with VIM. Just pipe output to a new file.

#!/usr/local/bin/perl

while (<>) {
  $_ =~ s/\n/,/g;
  print $_;
}
OverClocked
I'm guessing the perl interpreter isn't smart enough to know that in this case `$_` cannot have a newline except for the last character - `chomp` is probably a lot faster.
Jefromi
@JefromiIn my totally unscientific testing, it is about 300ms faster to use chomp on a 100 meg file.
Seamus
A: 
pajton
This is almost certainly faster than the solutions I posted, but unfortunately, it substitutes "," instead of ", " as the OP requested. I'm not sure there's a way to do that with `tr`, is there?
Jefromi
tr only takes single character
ghostdog74
No there is not, i didn't notice the space there. To put in more than 1 character, one could use sed as someone posted below.
pajton
Yeah, but sed is really not a good option - it's doing the same regex substitution that's too slow in Vim.
Jefromi
I know about this command, but I trying to find vim-only solution, without use any external tools.
Frankovskyi Bogdan
+2  A: 

:%s/$/, / followed by a :1,$j might be faster. Otherwise, do it in an external utility:

perl -e 'while (<>) { chomp; print "$_, " }' input_file > output_file

awk '{printf("%s, ", $0)}' input_file > output_file

Don't know off the top of my head which would be fastest.

Jefromi
`perl -ne 'chomp; print "$_, "' file`. `-n` "assumes while loop"
ghostdog74
Good call on the `-n`.
Jefromi
@sparrkey, "perl will run faster" is not justified.
ghostdog74
@ghostdog74 You're right, it isn't. In fact it is fairly comparable. As is python and tr.
Seamus
A: 
$ more file
aaaa
bbbb
cccc
dddd
eeee

$ awk 'NR>1{printf("%s, ", p)}{p=$0}END{print p}' file
aaaa, bbbb, cccc, dddd, eeee

$ sed -e :b -e '$!N;s/\n/, /;tb' file
ghostdog74
Did you test your sed command? sed 'N;s/\n/, /' file
sparkkkey
not really. its a cut an paste of wiki, but i guess wiki can't be trusted sometimes.
ghostdog74
A: 

Just so that sed doesn't feel left out.

sed -i ':a;N;$!ba;s/\n/, /g' file1

It will not run as fast as perl.

sparkkkey
did you test your sed command?
ghostdog74
A: 

The best tool is sed and you can use it with :! command

so use :!sed -e 's/\n/,/g' % > %.tmp ; cat %.tmp > % ; rm %.tmp'

You need create a tmp file with change before integrate in your current file

shingara
did you test your sed command?
ghostdog74
yes I test it before
shingara
+3  A: 

So, I went through and tested/timed some of the answers that were given by other people, plus a python answer of my own. Here is what I got:

tr:

> time tr "\n" "," < lines > line
real    0m1.617s
user    0m0.100s
sys     0m1.520s

python:

> time python -c 'import sys; print sys.stdin.read().replace("\n",", "),' < lines > line
real    0m1.663s
user    0m0.060s
sys     0m1.610s

awk:

> time awk '{printf("%s, ", $0)}' lines > line                                 
real    0m1.998s
user    0m0.390s
sys     0m1.600s

perl:

> time perl -e 'while (<>) { chomp; print "$_, " }' lines > line
real    0m2.100s
user    0m0.590s
sys     0m1.510s

sed:

> time sed 's/$/, /g' lines > line                                             
real    0m6.673s
user    0m5.050s
sys     0m1.630s

Here is the file I used:

> ls -lh lines
-rw-r--r-- 1 some one 101M 2010-03-04 19:54 lines
> wc -l < lines
1300000
> head -n 3 < lines
The pretty pink puma pounced on the unsuspecting aardvark, the scientist watched.
The pretty pink puma pounced on the unsuspecting aardvark, the scientist watched.
The pretty pink puma pounced on the unsuspecting aardvark, the scientist watched.
> head -n 1 < lines | wc -c
82

Originally the timings were taken in cygwin, they have now been taken with fully updated ubuntu 9.10. Also, the text files size was increased to 100 megs, with lines 80ish characters wide. As you can see pretty much anything other than sed is a good idea.

Seamus
i am very suspicious of your awk results. time you commands a few times, not just once. Python should not be faster than awk, considering it takes time to import modules and stuff
ghostdog74
It got ran a few times, that was about average. Just ran it about 10 more times, 1.7xx each time. Maybe it would be different if I wasn't using cygwin awk.
Seamus
@ghostdog74You were right to suspect my awk results, I re-ran it on a real linux box, and it was much faster.
Seamus