How can I replace all line-endings in big file (>100MB)? I have tried to do
:%s/\n/, /g
but it's too slow.
How can I replace all line-endings in big file (>100MB)? I have tried to do
:%s/\n/, /g
but it's too slow.
Use this Perl script to go through your file; it'd be faster than holding everything in memory with VIM. Just pipe output to a new file.
#!/usr/local/bin/perl
while (<>) {
$_ =~ s/\n/,/g;
print $_;
}
:%s/$/, /
followed by a :1,$j
might be faster. Otherwise, do it in an external utility:
perl -e 'while (<>) { chomp; print "$_, " }' input_file > output_file
awk '{printf("%s, ", $0)}' input_file > output_file
Don't know off the top of my head which would be fastest.
$ more file
aaaa
bbbb
cccc
dddd
eeee
$ awk 'NR>1{printf("%s, ", p)}{p=$0}END{print p}' file
aaaa, bbbb, cccc, dddd, eeee
$ sed -e :b -e '$!N;s/\n/, /;tb' file
Just so that sed doesn't feel left out.
sed -i ':a;N;$!ba;s/\n/, /g' file1
It will not run as fast as perl.
The best tool is sed and you can use it with :! command
so use :!sed -e 's/\n/,/g' % > %.tmp ; cat %.tmp > % ; rm %.tmp'
You need create a tmp file with change before integrate in your current file
So, I went through and tested/timed some of the answers that were given by other people, plus a python answer of my own. Here is what I got:
tr:
> time tr "\n" "," < lines > line
real 0m1.617s
user 0m0.100s
sys 0m1.520s
python:
> time python -c 'import sys; print sys.stdin.read().replace("\n",", "),' < lines > line
real 0m1.663s
user 0m0.060s
sys 0m1.610s
awk:
> time awk '{printf("%s, ", $0)}' lines > line
real 0m1.998s
user 0m0.390s
sys 0m1.600s
perl:
> time perl -e 'while (<>) { chomp; print "$_, " }' lines > line
real 0m2.100s
user 0m0.590s
sys 0m1.510s
sed:
> time sed 's/$/, /g' lines > line
real 0m6.673s
user 0m5.050s
sys 0m1.630s
Here is the file I used:
> ls -lh lines
-rw-r--r-- 1 some one 101M 2010-03-04 19:54 lines
> wc -l < lines
1300000
> head -n 3 < lines
The pretty pink puma pounced on the unsuspecting aardvark, the scientist watched.
The pretty pink puma pounced on the unsuspecting aardvark, the scientist watched.
The pretty pink puma pounced on the unsuspecting aardvark, the scientist watched.
> head -n 1 < lines | wc -c
82
Originally the timings were taken in cygwin, they have now been taken with fully updated ubuntu 9.10. Also, the text files size was increased to 100 megs, with lines 80ish characters wide. As you can see pretty much anything other than sed is a good idea.