I want to print certain lines from a text file in Unix. The line numbers to be printed are listed in another text file (one on each line).
Is there a quick way to do this with Perl or a shell script?
I want to print certain lines from a text file in Unix. The line numbers to be printed are listed in another text file (one on each line).
Is there a quick way to do this with Perl or a shell script?
I wouldn't do it this way with large files, but (untested):
open(my $fh1, "<", "line_number_file.txt") or die "Err: $!";
chomp(my @line_numbers = <$fh1>);
$_-- for @line_numbers;
close $fh1;
open(my $fh2, "<", "text_file.txt") or die "Err: $!";
my @lines = <$fh2>;
print @lines[@line_numbers];
close $fh2;
Assuming the line numbers to be printed are sorted.
open my $fh, '<', 'line_numbers' or die $!;
my @ln = <$fh>;
open my $tx, '<', 'text_file' or die $!;
foreach my $ln (@ln) {
my $line;
do {
$line = <$tx>;
} until $. == $ln and defined $line;
print $line if defined $line;
}
$ cat numbers
1
4
6
$ cat file
one
two
three
four
five
six
seven
$ awk 'FNR==NR{num[$1];next}(FNR in num)' numbers file
one
four
six
I'd do it like this:
#!/bin/bash
numbersfile=numbers
datafile=data
while read lineno < $numbersfile; do
sed -n "${lineno}p" datafile
done
Downside to my approach is that it will spawn a lot of processes so it will be slower than other options. It's infinitely more readable though.
You can avoid the limitations of the some of the other answers (requirements for sorted lines), simply by using eof
within the context of a basic while(<>)
block. That will tell you when you've stopped reading line numbers and started reading data. Note that you need to reset $.
when the switch occurs.
# Usage: perl script.pl LINE_NUMS_FILE DATA_FILE
use strict;
use warnings;
my %keep;
my $reading_line_nums = 1;
while (<>){
if ($reading_line_nums){
chomp;
$keep{$_} = 1;
$reading_line_nums = $. = 0 if eof;
}
else {
print if exists $keep{$.};
}
}
This is a short solution using bash and sed
sed -n -e "$(cat num |sed 's/$/p/')" file
Where num is the file of numbers and file is the input file ( Tested on OS/X Snow leopard)
$ cat num
1
3
5
$ cat file
Line One
Line Two
Line Three
Line Four
Line Five
$ sed -n -e "$(cat num |sed 's/$/p/')" file
Line One
Line Three
Line Five
cat -n foo | join foo2 - | cut -d" " -f2-
where foo is your file with lines to print and foo2 is your file of line numbers
$ cat input every good bird does fly $ cat lines 2 4 $ perl -ne 'BEGIN{($a,$b) = `cat lines`} print if $.==$a .. $.==$b' input good bird does
If that's too much for a one-liner, use
#! /usr/bin/perl
use warnings;
use strict;
sub start_stop {
my($path) = @_;
open my $fh, "<", $path
or die "$0: open $path: $!";
local $/;
return ($1,$2) if <$fh> =~ /\s*(\d+)\s*(\d+)/;
die "$0: $path: could not find start and stop line numbers";
}
my($start,$stop) = start_stop "lines";
while (<>) {
print if $. == $start .. $. == $stop;
}
Perl's magic open allows for creative possibilities such as
$ ./lines-between 'tac lines-between|' print if $. == $start .. $. == $stop; while (<>) {
Here is a way to do this in Perl without slurping anything so that the memory footprint of the program is independent of the sizes of both files (it does assume that the line numbers to be printed are sorted):
#!/usr/bin/perl
use strict; use warnings;
use autodie;
@ARGV == 2
or die "Supply src_file and filter_file as arguments\n";
my ($src_file, $filter_file) = @ARGV;
open my $src_h, '<', $src_file;
open my $filter_h, '<', $filter_file;
my $to_print = <$filter_h>;
while ( my $src_line = <$src_h> ) {
last unless defined $to_print;
if ( $. == $to_print ) {
print $src_line;
$to_print = <$filter_h>;
}
}
close $filter_h;
close $src_h;
Generate the source file:
C:\> perl -le "print for aa .. zz" > src
Generate the filter file:
C:\> perl -le "print for grep { rand > 0.75 } 1 .. 52" > filter
C:\> cat filter 4 6 10 12 13 19 23 24 28 44 49 50
Output:
C:\> f src filter ad af aj al am as aw ax bb br bw bx
To deal with an unsorted filter file, you can modified the while
loop:
while ( my $src_line = <$src_h> ) {
last unless defined $to_print;
if ( $. > $to_print ) {
seek $src_h, 0, 0;
$. = 0;
}
if ( $. == $to_print ) {
print $src_line;
$to_print = <$filter_h>;
}
}
This would waste a lot of time if the contents of the filter file are fairly random because it would keep rewinding to the beginning of the source file. In that case, I would recommend using Tie::File.
Here is a way to do this using Tie::File:
#!/usr/bin/perl
use strict; use warnings;
use autodie;
use Tie::File;
@ARGV == 2
or die "Supply src_file and filter_file as arguments\n";
my ($src_file, $filter_file) = @ARGV;
tie my @source, 'Tie::File', $src_file, autochomp => 0
or die "Cannot tie source '$src_file': $!";
open my $filter_h, '<', $filter_file;
while ( my $to_print = <$filter_h> ) {
print $source[$to_print - 1];
}
close $filter_h;
untie @source;