ansaurus

Question

Quickly remove first n lines from many text files

Answer 1

+6 A:

Use tail. Doubt anything could be significantly faster:

tail -n +3 input.txt > output.txt

Wrap it in your loop of choice. But I really doubt sed is a whole ton slower - as you say, disk i/o is usually the ultimate bottleneck.

Jefromi 2010-08-19 12:40:46

Thanks, I've just tried and it's essentially indistinguishable from the original in how long it takes to run (just like ghostdog's sed -i.bak), so I suspect it's an i/o bottleneck.

Gyppo 2010-08-19 13:58:52

I know there's no way I'll get an answer but... why'd this get downvoted? It's *the* canonical way to do this operation in *nix.

Jefromi 2010-08-19 18:39:58

Answer 2

+3 A:

for file in *.ext
do
    sed -i.bak -n '3,$p' $file 
done

or just

sed -i.bak -n '3,$p' *.ext

ghostdog74 2010-08-19 12:56:53

That's very nice, thanks, but unfortunately it appears that i/o is the bottleneck.

Gyppo 2010-08-19 13:57:17

Answer 3

+1 A:

I think this will be faster than launching sed:

import os
import shutil

path = '/some/path/to/files/'
for filename in os.listdir(path):
    basename, ext = os.path.splitext(filename)
    fullname = os.path.join(path, filename)
    newname = os.path.join(path, basename + '-out' + ext)
    with open(fullname) as read:
        #skip first two lines
        for n in xrange(2):
            read.readline()
        # hand the rest to shutil.copyfileobj
        with open(newname, 'w') as write:
            shutil.copyfileobj(read, write)

nosklo 2010-08-19 15:36:54

ansaurus

tags:

views:

answers:

Quickly remove first n lines from many text files

related questions