ansaurus

Question

Move top 1000 lines from text file to a new file using Unix shell commands

Answer 1

+1 A:

head -1000 file.txt > first100lines.txt
tail --lines=+1001 file.txt > restoffile.txt

cletus 2009-04-29 05:38:09

Upvoted, until I noticed the "and also delete these lines from the original file" requirement.

Brian Campbell 2009-04-29 05:42:20

This does not delete lines from the original file.

Alex Reynolds 2009-04-29 05:44:43

Have patience. Removing the first 1000 lines and writing it back out takes a long, long time.

le dorfier 2009-04-29 06:02:11

Answer 2

A:

Looks like a job for awk.

starblue 2009-04-29 05:46:36

Some people might actually prefer a solution rather than a vague pointer. This seems helpful only in the very broadest sense of that word.

paxdiablo 2009-04-29 07:17:29

Answer 3

+9 A:

head -1000 input > output && sed -i '1,+999d' input

For example:

$ cat input 
1
2
3
4
5
6
$ head -3 input > output && sed -i '1,+2d' input
$ cat input 
4
5
6
$ cat output 
1
2
3

marcog 2009-04-29 05:48:01

sed: 1: "input": command i expects \ followed by text

Alex Reynolds 2009-04-29 05:55:05

See example -- it works for me.

marcog 2009-04-29 06:00:03

This still gives the same error message.

Alex Reynolds 2009-04-29 06:00:16

You tried the example I pasted? :-/

marcog 2009-04-29 06:01:44

@Alex, do you have a file named 'input'?

Journeyman Programmer 2009-04-29 06:03:26

Alex Reynolds 2009-04-29 06:04:09

This does not work. Or if it does, it works with a specific version of sed.

Alex Reynolds 2009-04-29 06:18:56

I'm using sed 4.1.5

marcog 2009-04-29 06:35:36

Okay, I'm using FreeBSD, which does not have a GNU version of sed. I've added an answer that includes a test run of sed vs tail that suggests tail is faster. It is, however, only one test. Nonetheless, head/tail/cp/rm seem to have standard implementations across UNIXes and, if faster, may seem preferable to sed.

Alex Reynolds 2009-04-29 07:09:08

Answer 4

+2 A:

This is a one-liner but uses four atomic commands:

head -1000 file.txt > newfile.txt; tail +1000 file.txt > file.txt.tmp; cp file.txt.tmp file.txt; rm file.txt.tmp

Alex Reynolds 2009-04-29 05:51:17

He wants to *move* the first 1000 lines from one file to another. This deletes all but the first 1000 lines, i.e. is wrong.

marcog 2009-04-29 05:53:21

You're right. I'll edit this to fix it.

Alex Reynolds 2009-04-29 05:56:38

With "more than 50 million entries" that tail will be quite slow.

marcog 2009-04-29 06:00:58

Why are you doing "cp file.txt.tmp file.txt; rm file.txt.tmp" instead of "mv file.txt.tmp file.txt"?

Espo 2009-04-29 06:20:37

cp and rm are atomic filesystem operations. mv is not.

Alex Reynolds 2009-04-29 06:26:22

Please see answer below for one uncached trial each of tail vs sed approaches.

Alex Reynolds 2009-04-29 07:14:10

Answer 5

+3 A:

Out of curiosity, I found a box with a GNU version of sed (v4.1.5) and tested the (uncached) performance of two approaches suggested so far, using an 11M line text file:

$ wc -l input
11771722 input

$ time head -1000 input > output; time tail -n +1000 input > input.tmp; time cp input.tmp input; time rm input.tmp

real    0m1.165s
user    0m0.030s
sys     0m1.130s

real    0m1.256s
user    0m0.062s
sys     0m1.162s

real    0m4.433s
user    0m0.033s
sys     0m1.282s

real    0m6.897s
user    0m0.000s
sys     0m0.159s

$ time head -1000 input > output && time sed -i '1,+999d' input

real    0m0.121s
user    0m0.000s
sys     0m0.121s

real    0m26.944s
user    0m0.227s
sys     0m26.624s

This is the Linux I was working with:

$ uname -a
Linux hostname 2.6.18-128.1.1.el5 #1 SMP Mon Jan 26 13:58:24 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

For this test, at least, it looks like sed is slower than the tail approach (27 sec vs ~14 sec).

Alex Reynolds 2009-04-29 07:03:57

Answer 6

+2 A:

Perl approach:

perl -ne 'if($i<1000) { print; } else { print STDERR;}; $i++;' in 1> in.new 2> out && mv in.new in

piotr 2009-04-29 08:02:05

ansaurus

tags:

views:

answers:

Move top 1000 lines from text file to a new file using Unix shell commands

related questions