tags:

views:

113

answers:

4

I am matching a pattern and getting the line of the match using $.

I need to print the line matching before the particular pattern and after the particular pattern, e.g.:

line1
line2
line3
line4
line5

After my pattern matches line3, I want to print line2 and line4.

How can I do a pattern match in Perl? Can any one help me?

Thanks in advance

Senthil

+3  A: 

You want what is normally called context. The easiest way to get context is to maintain it yourself with a variable:

#!/usr/bin/perl

use strict;
use warnings;

my $old;
while (my $line = <DATA>) {
    if ($line =~ /line3/) {
        print "$old$line", scalar <DATA>;
        last;
    }
    $old = $line;
}

__DATA__
line1
line2
line3
line4
line5

If you need more than one line of context, it is better to use an array:

#!/usr/bin/perl

use strict;
use warnings;

my $context = shift || 3;
if ($context < 0) {
    $context = 0;
}

my @old;
while (my $line = <DATA>) {
    if ($line =~ /line6/) {
        print @old, $line;
        for (1 .. $context) {
            print scalar <DATA>;
        }
        last;
    }
    push @old, $line;
    #remove a line if we have more than we need
    if (@old > $context) {
        shift @old;
    }
}

__DATA__
line1
line2
line3
line4
line5
line6
line7
line8
line9
Chas. Owens
+1  A: 

I realize you asked for a Perl solution, but here is a Unix grep solution anyway:

grep -C 1 line3 file.txt

outputs:

line2
line3
line4

From the grep manpage:

   -C NUM, --context=NUM
    Print  NUM lines of output context.  Places a line containing --
    between contiguous groups of matches.
toolic
But `grep` doesn't have as good of a regex engine as `perl`. To get the ease of `grep`, but the power of `perl`'s regexes, use [`ack`](http://search.cpan.org/dist/ack/ack-base) instead: `ack -C 1 line3 file.txt`
Chas. Owens
To use Perl regular experssion syntax with Unix grep, use `grep -P`
toolic
+2  A: 

With the entire file in a scalar, write your pattern so it captures the lines before and after line3. The /m modifier is especially useful:

Treat string as multiple lines. That is, change ^ and $ from matching the start or end of the string to matching the start or end of any line anywhere within the string.

The patterns below use the /x modifier that lets us add whitespace to make them look like what they're matching.

For example:

#! /usr/bin/perl

my $data = do { local $/; <DATA> };

my $pattern = qr/ ^(.+\n)
                  ^line3\n
                  ^(.+\n)
                /mx;

if ($data =~ /$pattern/) {
  print $1, $2;
}
else {
  print "no match\n";
}

__DATA__
line1
line2
line3
line4
line5

Output:

line2
line4

Remember that $ is an assertion: it doesn't consume any characters, so you have to match newline with a literal \n pattern.

Also note that the pattern above lacks generality. It works fine for a line somewhere in the middle, but it will fail if you change line3 to line1 or line5.

For the line1 case, you could make the previous line optional with a ? quantifier:

my $pattern = qr/ ^(.+\n)?
                  ^line1\n
                  ^(.+\n)
                /mx;

As expected, this produces output of

line2

But trying the same fix for the line5 case

my $pattern = qr/ ^(.+\n)?
                  ^line5\n
                  ^(.+\n)?
                /mx;

gives

no match

This is because after the final newline in the file (the one following line5), ^ has nowhere to match, but changing the pattern to

my $pattern = qr/ ^(.+\n)?
                  ^line5\n
                  (^.+\n)?
                /mx;

outputs

line4

We might stop here, but the asymmetry in the pattern is unpleasing. Why did work for one case and not for the other? With line1, ^ matches the beginning of $data and then matches nothing for (.+\n)?.

Remember: patterns quantified with ? or * always succeed because they're semantically the same as

  • zero times or one time
  • zero or more times

respectively, and anything can match zero times:

$ perl -le 'print scalar "abc" =~ /(?!)*/'
1

Although I can't think of a time I've ever seen it used this way, an {m,n} quantifier where m is zero, e.g.,

  • {0,100}
  • {0,}
  • {0}

will always succeed because m is the minimum number of repetitions. The {0} quantifier is a pathological case included for completeness.

All that was to show we more or less got lucky with the line1 case. ^ matched the very beginning, the ?-quantified pattern matched nothing, and then the next ^ also matched the very beginning of $data.

Restoring symmetry makes a cleaner pattern:

my $pattern = qr/ (^.+\n)?
                  ^line5\n
                  (^.+\n)?
                /mx;
Greg Bacon
+1  A: 

Using unix command line power is great is such cases and perl embraces it. try something like grep -A 1 or grep -B 1 it will give you the line after/before

Noam
Oh, and although the solutions above will work, they are a hell of alot harder to code and are not needed in such a case
Noam