tags:

views:

150

answers:

10

I want to print certain lines from a text file in Unix. The line numbers to be printed are listed in another text file (one on each line).

Is there a quick way to do this with Perl or a shell script?

A: 

I wouldn't do it this way with large files, but (untested):

open(my $fh1, "<", "line_number_file.txt") or die "Err: $!";
chomp(my @line_numbers = <$fh1>);
$_-- for @line_numbers;
close $fh1;

open(my $fh2, "<", "text_file.txt") or die "Err: $!";
my @lines = <$fh2>;

print @lines[@line_numbers];
close $fh2;
runrig
<3 Pancake Bunny
R. Bemrose
Here's an example:File 1 has this data:AnnaBobCathyDarrenFile 2 has this:24I want to use file 2 to determine which lines of File 1 are printed. In this case, I want to print the 2nd and 4th lines of File 2, so my results would be:AnnaDarrenThanks!
itzy
Hm, the comment didn't format as expected... the files all have just one word or number on each line.
itzy
Ok, I've got the idea now.
runrig
+4  A: 

Assuming the line numbers to be printed are sorted.

open my $fh, '<', 'line_numbers' or die $!;
my @ln = <$fh>;
open my $tx, '<', 'text_file' or die $!;
foreach my $ln (@ln) {
  my $line;
  do {
    $line = <$tx>;
  } until $. == $ln and defined $line;
  print $line if defined $line;
}
M42
+1 for using best practices throughout. nice example!
Ether
Thanks and thanks again for correction
M42
+3  A: 
$ cat numbers
1
4
6
$ cat file
one
two
three
four
five
six
seven
$ awk 'FNR==NR{num[$1];next}(FNR in num)' numbers file
one
four
six
ghostdog74
+1 nice and clean :)
nico
for a GNU tools answer, how about sed?
Cole
A: 

I'd do it like this:

#!/bin/bash
numbersfile=numbers
datafile=data

while read lineno < $numbersfile; do
    sed -n "${lineno}p" datafile
done

Downside to my approach is that it will spawn a lot of processes so it will be slower than other options. It's infinitely more readable though.

Daenyth
+2  A: 

You can avoid the limitations of the some of the other answers (requirements for sorted lines), simply by using eof within the context of a basic while(<>) block. That will tell you when you've stopped reading line numbers and started reading data. Note that you need to reset $. when the switch occurs.

# Usage: perl script.pl LINE_NUMS_FILE DATA_FILE

use strict;
use warnings;

my %keep;
my $reading_line_nums = 1;

while (<>){
    if ($reading_line_nums){
        chomp;
        $keep{$_} = 1;
        $reading_line_nums = $. = 0 if eof;
    }
    else {
        print if exists $keep{$.};    
    }
}
FM
A: 

This is a short solution using bash and sed

sed -n -e "$(cat num |sed 's/$/p/')" file

Where num is the file of numbers and file is the input file ( Tested on OS/X Snow leopard)

$ cat num
1
3
5

$ cat file
Line One
Line Two
Line Three
Line Four
Line Five

$ sed -n -e "$(cat num |sed 's/$/p/')" file
Line One
Line Three
Line Five
Steve Weet
+2  A: 

cat -n foo | join foo2 - | cut -d" " -f2-

where foo is your file with lines to print and foo2 is your file of line numbers

frankc
similar, but probably slower (textfile and lines are the 2 files): cat -n textfile | grep -f lines | cut -d' ' -f2
dblu
That one is going to print the wrong stuff. If the lines file has 3 it will print line 3, 13, 23 etc, plus lines where 3 just happens to be part of the original input
frankc
A: 
$ cat input
every
good
bird
does
fly

$ cat lines
2
4

$ perl -ne 'BEGIN{($a,$b) = `cat lines`} print if $.==$a .. $.==$b' input
good
bird
does

If that's too much for a one-liner, use

#! /usr/bin/perl

use warnings;
use strict;

sub start_stop {
  my($path) = @_;
  open my $fh, "<", $path
    or die "$0: open $path: $!";

  local $/;
  return ($1,$2) if <$fh> =~ /\s*(\d+)\s*(\d+)/;
  die "$0: $path: could not find start and stop line numbers";
}

my($start,$stop) = start_stop "lines";

while (<>) {
  print if $. == $start .. $. == $stop;
}

Perl's magic open allows for creative possibilities such as

$ ./lines-between 'tac lines-between|'
  print if $. == $start .. $. == $stop;
while (<>) {

Greg Bacon
+1  A: 

Here is a way to do this in Perl without slurping anything so that the memory footprint of the program is independent of the sizes of both files (it does assume that the line numbers to be printed are sorted):

#!/usr/bin/perl

use strict; use warnings;
use autodie;

@ARGV == 2
    or die "Supply src_file and filter_file as arguments\n";

my ($src_file, $filter_file) = @ARGV;

open my $src_h, '<', $src_file;
open my $filter_h, '<', $filter_file;

my $to_print = <$filter_h>;

while ( my $src_line = <$src_h> ) {
    last unless defined $to_print;
    if ( $. == $to_print ) {
        print $src_line;
        $to_print = <$filter_h>;
    }
}

close $filter_h;
close $src_h;

Generate the source file:

C:\>  perl -le "print for aa .. zz" > src

Generate the filter file:

C:\> perl -le "print for grep { rand > 0.75 } 1 .. 52" > filter
C:\> cat filter
4
6
10
12
13
19
23
24
28
44
49
50

Output:

C:\> f src filter
ad
af
aj
al
am
as
aw
ax
bb
br
bw
bx

To deal with an unsorted filter file, you can modified the while loop:

while ( my $src_line = <$src_h> ) {
    last unless defined $to_print;
    if ( $. > $to_print ) {
        seek $src_h, 0, 0;
        $. = 0;
    }
    if ( $. == $to_print ) {
        print $src_line;
        $to_print = <$filter_h>;
    }
}

This would waste a lot of time if the contents of the filter file are fairly random because it would keep rewinding to the beginning of the source file. In that case, I would recommend using Tie::File.

Sinan Ünür
A: 

Here is a way to do this using Tie::File:

#!/usr/bin/perl

use strict; use warnings;
use autodie;
use Tie::File;

@ARGV == 2
    or die "Supply src_file and filter_file as arguments\n";

my ($src_file, $filter_file) = @ARGV;

tie my @source, 'Tie::File', $src_file, autochomp => 0
    or die "Cannot tie source '$src_file': $!";

open my $filter_h, '<', $filter_file;

while ( my $to_print = <$filter_h> ) {
    print $source[$to_print - 1];
}

close $filter_h;

untie @source;
Sinan Ünür