tags:

views:

490

answers:

8

From a related question asked by Bi, I've learnt how to print a matching line together with the line immediately below it. The code looks really simple:

#!perl
open(FH,'FILE');
while ($line = <FH>) {
    if ($line =~ /Pattern/) {
        print "$line";
        print scalar <FH>;
    }
}

I then searched Google for a different code that can print matching lines with the lines immediately above them. The code that would partially suit my purpose is something like this:

#!perl

@array;
open(FH, "FILE");
while ( <FH> ) {
  chomp;
  $my_line = "$_";
  if ("$my_line" =~ /Pattern/) {
      foreach( @array ){
          print "$_\n";
      }
      print "$my_line\n"
  }
  push(@array,$my_line);
  if ( "$#array" > "0" ) {
    shift(@array);
  }
};

Problem is I still can't figure out how to do them together. Seems my brain is shutting down. Does anyone have any ideas?

Thanks for any help.

UPDATE:

I think I'm sort of touched. You guys are so helpful! Perhaps a little Off-topic, but I really feel the impulse to say more.

I needed a Windows program capable of searching the contents of multiple files and of displaying the related information without having to separately open each file. I tried googling and two apps, Agent Ransack and Devas, have proved to be useful, but they display only the lines containing the matched query and I want aslo to peek at the adjacent lines. Then the idea of improvising a program popped into my head. Years ago I was impressed by a Perl script that could generate a Tomeraider format of Wikipedia so that I can handily search Wiki on my Lifedrive and I've also read somewhere on the net that Perl is easy to learn especially for some guy like me who has no experience in any programming language. Then I sort of started teaching myself Perl a couple of days ago. My first step was to learn how to do the same job as "Agent Ransack" does and it proved to be not so difficult using Perl. I first learnt how to search the contents of a single file and display the matching lines through the modification of an example used in the book titled "Perl by Example", but I was stuck there. I became totally clueless as how to deal with multiple files. No similar examples were found in the book or probably because I was too impatient. And then I tried googling again and was led here and I asked my first question "How can I search multiple files for a string pattern in Perl?" here and I must say this forum is bloody AWESOME ;). Then I looked at more example scripts and then I came up with the following code yesterday and it serves my original purpose quite well:

The codes goes like this:

#!perl

$hits=0;
print "INPUT YOUR QUERY:";
chop ($query = <STDIN>);
$dir = 'f:/corpus/'; 
@files = <$dir/*>;
foreach $file (@files) {
open   (txt, "$file");

while($line = <txt>) {
if ($line =~ /$query/i) {   
$hits++;
print "$file \n $line";     
print scalar <txt>;
}
}
}
close(txt);
print "$hits RESULTS FOUND FOR THIS SEARCH\n";

In the folder "corpus", I have a lot of text files including srt pdf doc files that contain such contents as follows:

Then I dumped the body.

J'ai mis le corps dans une décharge.

I know you have a wire.

Je sais que tu as un micro.

Now I'll tell you the truth.

Alors je vais te dire la vérité.

Basically I just need to search an English phrase and look at the French equivalent, so the script I finished yesterday is quite satisfying except that it would to be better if my script can display the above line in case I want to search a French phrase and check the English. So I'm trying to improve the code. Actually I knew the "print scalar " is buggy, but it is neat and does the job of printing the subsequent line at least most of the time). I was even expecting ANOTHER SINGLE magic line that prints the previous line instead of the subsequent :) Perl seems to be fun. I think I will spend more time trying to get a better understanding of it. And as suggested by daotoad, I'll study the codes generously offered by you guys. Again thanks you guys!

+3  A: 

You always want to store the last line that you saw in case the next line has your pattern and you need to print it. Using an array like you did in the second code snippet is probably overkill.

my $last = "";
while (my $line = <FH>) {
  if ($line =~ /Pattern/) {
    print $last;
    print $line;
    print scalar <FH>;  # next line
  }
  $last = $line;
}
mobrule
If the pattern might appear on consecutive lines, then you might want to do this a little differently.
mobrule
Awesome! The code works like magic! Thanks thanks thanks!
Mike
I agree with @mobrule, but that can be fixed simply by changing the last two prints to `print $last = $line; print $line = <FH>;` and then putting the `$last = $line;` in an `else` block.
Chris Lutz
It's not magic, it's Perl!
mobrule
@Chris Lutz, the fix doesn't seem to be working. I tested it but it failed. I experimented with adding "$last=$ine" under "print $line" but in vain. Any idea why?
Mike
+8  A: 

It will probably be easier just to use grep for this as it allows printing of lines before and after a match. Use -B and -A to print context before and after the match respectively. See http://ss64.com/bash/grep.html

Brian Rasmussen
I thought so too, but then the OP doesn't learn anything about Perl, except maybe **not** to use it for everything.
pavium
+1 for the right tool for the job. In this case Perl isn't the _best_ solution if `grep(1)` (disambiguate from Perl's `grep()` function) is avaliable. Also, a similar (and more powerful (and written in Perl)) tool would be `ack(1)` which is an amazing little program.
Chris Lutz
the question I posted is only part of the several functionalities I wished to add to my app. I'm learning Perl without any experience in other languages. But well I see grep looks great! I've bookmarked the url.
Mike
+1  A: 
grep -A 1 -B 1 "search line"
kal
+1  A: 

If you don't mind losing the ability to iterate over a filehandle, you could just slurp the file and iterate over the array:

#!/usr/bin/perl

use strict; # always do these
use warnings;

my $range = 1; # change this to print the first and last X lines

open my $fh, '<', 'FILE' or die "Error: $!";
my @file = <$fh>;
close $fh;

for (0 .. $#file) {
  if($file[$_] =~ /Pattern/) {
    my @lines = grep { $_ > 0 && $_ < $#file } $_ - $range .. $_ + $range;
    print @file[@lines];
  }
}

This might get horribly slow for large files, but is pretty easy to understand (in my opinion). Only when you know how it works can you set about trying to optimize it. If you have any questions about any of the functions or operations I used, just ask.

Chris Lutz
Terribly inefficient, but fairly easy to understand. For readability I'd replace the grep with something like `my $start = $_ - $range; $start = 0 unless $start >= 0;` and `my $end = $_ + range; $end = $#lines unless $end <= $#lines;` and then do `print @file[$start..$end];`
daotoad
@daotoad - Too much functional hype has made me think that `grep()` is somehow easier/more readable. I agree that yours is definitely easier to grok.
Chris Lutz
this is still a little bit beyond me :( well, anyway I'm in the process of familiarizing myself with the very very basics and therefore I was thinking my questions would be better reserved for the later stage :) I really appreciate your answer.
Mike
@Mike - This isn't necessarily the best way to do it, but most of it is how people write modern Perl. If you're learning Perl from a book or older tutorial, for example, you may see `open FILE, "filename";` but Perl allows you to use a variable (like `$fh` ) instead of a filehandle (like `FILE` ) - an improvement, because variables can be scoped, while filehandles are global. Other than `grep()` and the `..` operator, there isn't much here that's terribly advanced. (And like @daotoad said, `grep()` isn't the best answer here.)
Chris Lutz
@Chris - Plz correct me if I'm wrong. (commenting from the "for (0 .. $#file)" line onward) # read the contents from the first line(index 0) to the last ($#file). #if the contents of a particular line on a particular line number ($_) match the pattern and this line number($_) is greater than 0 and less than the last line number, search through the numbers between the sum of this number minus $range and the sum of this number plus $range and then store them in a container named "@lines". Display the contents of each line that has their corresponding number stored in "@lines".
Mike
Mike
Why can't we simply use "my @lines = ($_ - $range .. $_ + $range);"?
Mike
Ahh, I'm starting to see the point of the grep function. Tested the code only to find the "my @lines = ($_ - $range .. $_ + $range);" line is buggy. It would print the unnecessary last line if the match is found on the first line. Yes, that's it!
Mike
Oh, no...the grep function is also buggy! It won't print the very matching line if the pattern is found on the first line of the file.
Mike
Ahh...the version modified by daotoad is tested bug free.
Mike
Looks like: if the pattern is found on the first line, $_ would be 0, and $_ minus $range would end up as something less than 0....then, bug is sure to come
Mike
should be my @lines = grep { $_ >= 0
Mike
+6  A: 

Given the following input file:

(1:first) Yes, this one.
(2) This one as well (XXX).
(3) And this one.
Not this one.
Not this one.
Not this one.
(4) Yes, this one.
(5) This one as well (XXX).
(6) AND this one as well (XXX).
(7:last) And this one.
Not this one.

this little snippet:

open(FH, "<qq.in");
$this_line = "";
$do_next = 0;
while(<FH>) {
    $last_line = $this_line;
    $this_line = $_;
    if ($this_line =~ /XXX/) {
        print $last_line if (!$do_next);
        print $this_line;
        $do_next = 1;
    } else {
        print $this_line if ($do_next);
        $last_line = "";
        $do_next = 0;
    }
}
close (FH);

produces the following, which is what I think you were after:

(1:first) Yes, this one.
(2) This one as well (XXX).
(3) And this one.
(4) Yes, this one.
(5) This one as well (XXX).
(6) AND this one as well (XXX).
(7:last) And this one.

It basically works by remembering the last line read and, when it finds the pattern, it outputs it and the pattern line. Then it continues to output pattern lines plus one more (with the $do_next variable).

There's also a little bit of trickery in there to ensure no line is printed twice.

paxdiablo
+1 even though I'm not fond of the output format (I don't think you should have repeats, even though my answer does).
Chris Lutz
Yes, slight bug, fixed now :-)
paxdiablo
Please use lexical file handles and the 3 argument open. Even though in a short script like this there is no big reason to avoid globals, IMO, it is best to develop good habits through practice.
daotoad
You're welcome to submit a more correct answer, @daotoad :-) This one may seem ancient to you but that's because I really only use Perl for quick and dirty scripts, so have no need to use the more modern stuff. If I need a more complicated app, I tend to use Java. Your point is taken, however.
paxdiablo
Okay, I posted an "updated" version of your code. But you deserve the credit for the clear, efficient implementation. All I did was add minor tweaks.
daotoad
+2  A: 

Command line grep is the quickest way to accomplish this, but if your goal is to learn some Perl then you'll need to produce some code.

Rather than providing code, as others have already done, I'll talk a bit about how to write your own. I hope this helps with the brain-lock.

  • Read my previous answer on how to write a program, it gives some tips about how to start working on your problem.
  • Go through each of the sample programs you have, as well as those offered here and comment out exactly what they do. Refer to the perldoc for each function and operator you don't understand. Your first example code has an error, if 2 lines in a row match, the line after the second match won't print. By error, I mean that either the code or the spec is wrong, the desired behavior in this case needs to be determined.
  • Write out what you want your program to do.
  • Start filling in the blanks with code.

Here's a sketch of a phase one write-up:

# This program reads a file and looks for lines that match a pattern.

# Open the file

# Iterate over the file
# For each line
#    Check for a match
#    If match print line before, line and next line.

But how do you get the next line and the previous line?

Here's where creative thinking comes in, there are many ways, all you need is one that works.

  • You could read in lines one at a time, but read ahead by one line.
  • You could read the whole file into memory and select previous and follow-on lines by indexing an array.
  • You could read the file and store the offset and length each line--keeping track of which ones match as you go. Then use your offset data to extract the required lines.
  • You could read in lines one at a time. Cache your previous line as you go. Use readline to read the next line for printing, but use seek and tell to rewind the handle so that the 'next' line can be checked for a match.

Any of these methods, and many more could be fleshed out into a functioning program. Depending on your goals, and constraints any one may be the best choice for that problem domain. Knowing how to select which one to use will come with experience. If you have time, try two or three different ways and see how they work out.

Good luck.

daotoad
Well, I would really like to say that I appreciate the thoughts behind your answer to this posting. I would like to say more but this comment box has the character input limit, so I updated my original posting. Anyways thanks.
Mike
+4  A: 

Here's a modernized version of Pax's excellent answer:

use strict;
use warnings;

open( my $fh, '<', 'qq.in') 
    or die "Error opening file - $!\n";

my $this_line = "";
my $do_next = 0;

while(<$fh>) {
    my $last_line = $this_line;
    $this_line = $_;

    if ($this_line =~ /XXX/) {
        print $last_line unless $do_next;
        print $this_line;
        $do_next = 1;
    } else {
        print $this_line if $do_next;
        $last_line = "";
        $do_next = 0;
    }
}
close ($fh);

See Why is three-argument open calls with lexical filehandles a Perl best practice? for an discussion of the reasons for the most important changes.

Important changes:

  • 3 argument open.
  • lexical filehandle
  • added strict and warnings pragmas.
  • variables declared with lexical scope.

Minor changes (issues of style and personal taste):

  • removed unneeded parens from post-fix if
  • converted an if-not contstruct into unless.

If you find this answer useful, be sure to up-vote Pax's original.

daotoad
Technically, that's two arguments :-) But the main reason for the 3-arg one doesn't really exist here since you have total control over the filename. I'll take all those suggestion on board in future, the strict and warnings I usually add only when my initial versions don't behave :-) But the global file handle avoidance is a good one. Sorry 'bout the if's, they were originally 'if () {}' and I remembered the postfix version later when compressing the code. +1.
paxdiablo
@Pax, I can't believe I missed that edit! It's really 3 now. I agree that the reasons don't really apply in this script. Despite that fact, I would still write this code with the 3 arg form for the sake of consistency with my other code, and to reinforce a good practice. If there was a good reason to use the two arg form (not that I know of one) I would use it, and leave a comment as to why.
daotoad
+1  A: 

I am going to ignore the title of your question and focus on some of the code you posted because it is positively harmful to let this code stand without explaining what is wrong with it. You say:

code that can print matching lines with the lines immediately above them. The code that would partially suit my purpose is something like this

I am going to go through that code. First, you should always include

use strict;
use warnings;

in your scripts, especially since you are just learning Perl.

@array;

This is a pointless statement. With strict, you can declare @array using:

my @array;

Prefer the three-argument form of open unless there is a specific benefit in a particular situation to not using it. Use lexical filehandles because bareword filehandles are package global and can be the source of mysterious bugs. Finally, always check if open succeeded before proceeding. So, instead of:

open(FH, "FILE");

write:

my $filename = 'something';
open my $fh, '<', $filename
    or die "Cannot open '$filename': $!";

If you use autodie, you can get away with:

open my $fh, '<', 'something';

Moving on:

while ( <FH> ) {
  chomp;
  $my_line = "$_";

First, read the FAQ (you should have done so before starting to write programs). See What's wrong with always quoting "$vars"?. Second, if you are going to assign the line that you just read to $my_line, you should do it in the while statement so you do not needlessly touch $_. Finally, you can be strict compliant without typing any more characters:

while ( my $line =  <$fh> ) {
    chomp $line;

Refer to the previous FAQ again.

  if ("$my_line" =~ /Pattern/) {

Why interpolate $my_line once more?

      foreach( @array ){
          print "$_\n";
      }

Either use an explicit loop variable or turn this into:

print "$_\n" for @array;

So, you interpolate $my_line again and add the newline that was removed by chomp earlier. There is no reason to do so:

      print "$my_line\n"

And now we come to the line that motivated me to dissect the code you posted in the first place:

  if ( "$#array" > "0" ) {

$#array is a number. 0 is a number. > is used to check if the number on the LHS is greater than the number on the RHS. Therefore, there is no need to convert both operands to strings.

Further, $#array is the last index of @array and its meaning depends on the value of $[. I cannot figure out what this statement is supposed to be checking.

Now, your original problem statement was

print matching lines with the lines immediately above them

The natural question, of course, is how many lines "immediately above" the match you want to print.

#!/usr/bin/perl

use strict;
use warnings;

use Readonly;
Readonly::Scalar my $KEEP_BEFORE => 4;

my $filename = $ARGV[0];
my $pattern  = qr/$ARGV[1]/;

open my $input_fh, '<', $filename
    or die "Cannot open '$filename': $!";

my @before;

while ( my $line = <$input_fh> ) {
    $line = sprintf '%6d: %s', $., $line;
    print @before, $line, "\n" if $line =~ $pattern;
    push @before, $line;
    shift @before if @before > $KEEP_BEFORE;
}

close $input_fh;
Sinan Ünür
Thank you very much for your advice and detailed explantion. Thank you!
Mike
I've written down the key points of your comments in my notebook. Thanks again!
Mike
@Mike: You are welcome.
Sinan Ünür