ansaurus

Question

Extracting n rows of text from a large csv file

Answer 1

+1 A:

sed -n 2000,4000p somefile.txt

will print from lines 2000 to 4000 to stdout.

deinst 2010-08-12 16:25:12

Answer 2

+2 A:

I don't have a terminal handy to try it out, but it should be just split -d -l 50000 foo.csv.

Hopefully the naming isn't terribly important because with the -d option, the output files will be named foo.csv00 .. foo.csv03. You can add the -a 1 option so that the suffixes are 0-3, but there's no simple way to get the suffix to be injected into the middle of the filename.

Mark Rushakoff 2010-08-12 16:25:20

Answer 3

A:

you should use head and tail.

head -n 50000 myfile > part1.csv
head -n 100000 myfile | tail -n 50000 > part2.csv 
head -n 150000 myfile | tail -n 50000 > part3.csv

etc ...

Else, but with no control on file names, you can use unix command split.

Guillaume Lebourgeois 2010-08-12 16:25:22

Answer 4

+1 A:

split -l50000 foo.csv

Jeremy 2010-08-12 16:25:37

Answer 5

A:

You can use sed

Jon Freedman 2010-08-12 16:27:39

Answer 6

A:

I wrote this little shell script for this topic very similar at yours.

This shell script + awk works fine for me:

#!/bin/bash
awk -v initial_line=$1 -v end_line=$2 '{
    if (NR >= initial_line && NR <= end_line) 
    print $0
}' $3

Used with this sample file (file.txt):

one
two
three
four
five
six

The command (it will extract from second to fourth line in the file):

edu@debian5:~$./script.sh 2 4 file.txt

Output of this command:

two
three
four

Of course, you can improve it, for example by testing that all argument values are the expected :-)

SourceRebels 2010-08-12 16:28:41

ansaurus

tags:

views:

answers:

Extracting n rows of text from a large csv file

related questions