views:

308

answers:

6

Using sed or similar how would you extract lines from a file? If I wanted lines 1, 5, 1010, 20503 from a file, how would I get these 4 lines?

What if I have a fairly large number of lines I need to extract? If I had a file with 100 lines, each representing a line number that I wanted to extract from another file, how would I do that?

A: 

I'd investigate Perl, since it has the regexp facilities of sed plus the programming model surrounding it to allow you to read a file line by line, count the lines and extract according to what you want (including from a file of line numbers).

my $row = 1
while (<STDIN>) {
   # capture the line in $_ and check $row against a suitable list.
   $row++;
}
Brian Agnew
and you can use perl -e 'perlcode here' from the command prompt. Perl also has a range operator .. as in 3..12 which will allow you to create a list of numbers where needed.
Christian Vik
You should be using `$.`, which automagically contains the current line number
Hasturkun
@Hasturkun - didn't know that! Thanks.
Brian Agnew
Anybody interested in Perl command line techniques might want to look at Minimal Perl, from Manning... http://manning.com/maher/
Joe Internet
+7  A: 

Something like "sed -n '1p;5p;1010p;20503p'. Execute the command "man sed" for details.

For your second question, I'd transform the input file into a bunch of sed(1) commands to print the lines I wanted.

Steve Emmerson
+1, the thing to look up for the second part of the answer is `sed -f`
Michael Krelin - hacker
+3  A: 

with awk it's as simple as:

awk 'NR==1 || NR==5 || NR==1010' "file"
ennuikiller
+1 for using awk.
slebetman
I love awk, but this is clearly the case for sed.
Michael Krelin - hacker
@Michael agreed I was just showing another way
ennuikiller
@michael, nonsense, awk can do that too.
ghostdog74
ennuikiller, yes, I was mostly commenting on +1 for using awk in this context, ghostdog74, so can perl, python, pure bash, etc. It's a matter of opinion on the *right* tool for the job.
Michael Krelin - hacker
+2  A: 

@OP, you can do this easier and more efficiently with awk. so for your first question

awk 'NR~/^(1|2|5|1010)$/{print}' file

for 2nd question

awk 'FNR==NR{a[$1];next}(FNR in a){print}' file_with_linenr file
ghostdog74
my sentiments exactly.
glenn jackman
A: 

This ain't pretty and it could exceed command length limits under some circumstances*:

sed -n "$(while read a; do echo "${a}p;"; done < line_num_file)" data_file

Or its much slower but more attractive, and possibly more well-behaved, sibling:

while read a; do echo "${a}p;"; done < line_num_file | xargs -I{} sed -n \{\} data_file

A variation:

xargs -a line_num_file -I{} sed -n \{\}p\; data_file

You can speed up the xarg versions a little bit by adding the -P option with some large argument like, say, 83 or maybe 419 or even 1177, but 10 seems as good as any.

*xargs --show-limits </dev/null can be instructive

Dennis Williamson
A: 

In Perl:

perl -ne 'print if $. =~ m/^(1|5|1010|20503)$/' file
ire_and_curses