ansaurus

Question

How do I match the line before and after a pattern match in Perl?

Answer 1

+3 A:

You want what is normally called context. The easiest way to get context is to maintain it yourself with a variable:

#!/usr/bin/perl

use strict;
use warnings;

my $old;
while (my $line = <DATA>) {
    if ($line =~ /line3/) {
        print "$old$line", scalar <DATA>;
        last;
    }
    $old = $line;
}

__DATA__
line1
line2
line3
line4
line5

If you need more than one line of context, it is better to use an array:

#!/usr/bin/perl

use strict;
use warnings;

my $context = shift || 3;
if ($context < 0) {
    $context = 0;
}

my @old;
while (my $line = <DATA>) {
    if ($line =~ /line6/) {
        print @old, $line;
        for (1 .. $context) {
            print scalar <DATA>;
        }
        last;
    }
    push @old, $line;
    #remove a line if we have more than we need
    if (@old > $context) {
        shift @old;
    }
}

__DATA__
line1
line2
line3
line4
line5
line6
line7
line8
line9

Chas. Owens 2010-09-04 08:40:33

Answer 2

+1 A:

I realize you asked for a Perl solution, but here is a Unix grep solution anyway:

grep -C 1 line3 file.txt

outputs:

line2
line3
line4

From the grep manpage:

   -C NUM, --context=NUM
    Print  NUM lines of output context.  Places a line containing --
    between contiguous groups of matches.

toolic 2010-09-04 11:44:38

But `grep` doesn't have as good of a regex engine as `perl`. To get the ease of `grep`, but the power of `perl`'s regexes, use [`ack`](http://search.cpan.org/dist/ack/ack-base) instead: `ack -C 1 line3 file.txt`

Chas. Owens 2010-09-04 13:28:53

To use Perl regular experssion syntax with Unix grep, use `grep -P`

toolic 2010-09-04 14:26:58

Answer 3

+2 A:

With the entire file in a scalar, write your pattern so it captures the lines before and after line3. The /m modifier is especially useful:

Treat string as multiple lines. That is, change ^ and $ from matching the start or end of the string to matching the start or end of any line anywhere within the string.

The patterns below use the /x modifier that lets us add whitespace to make them look like what they're matching.

For example:

#! /usr/bin/perl

my $data = do { local $/; <DATA> };

my $pattern = qr/ ^(.+\n)
                  ^line3\n
                  ^(.+\n)
                /mx;

if ($data =~ /$pattern/) {
  print $1, $2;
}
else {
  print "no match\n";
}

__DATA__
line1
line2
line3
line4
line5

Output:

line2
line4

Remember that $ is an assertion: it doesn't consume any characters, so you have to match newline with a literal \n pattern.

Also note that the pattern above lacks generality. It works fine for a line somewhere in the middle, but it will fail if you change line3 to line1 or line5.

For the line1 case, you could make the previous line optional with a ? quantifier:

my $pattern = qr/ ^(.+\n)?
                  ^line1\n
                  ^(.+\n)
                /mx;

As expected, this produces output of

line2

But trying the same fix for the line5 case

my $pattern = qr/ ^(.+\n)?
                  ^line5\n
                  ^(.+\n)?
                /mx;

gives

no match

This is because after the final newline in the file (the one following line5), ^ has nowhere to match, but changing the pattern to

my $pattern = qr/ ^(.+\n)?
                  ^line5\n
                  (^.+\n)?
                /mx;

outputs

line4

We might stop here, but the asymmetry in the pattern is unpleasing. Why did work for one case and not for the other? With line1, ^ matches the beginning of $data and then matches nothing for (.+\n)?.

Remember: patterns quantified with ? or * always succeed because they're semantically the same as

zero times or one time
zero or more times

respectively, and anything can match zero times:

$ perl -le 'print scalar "abc" =~ /(?!)*/'
1

Although I can't think of a time I've ever seen it used this way, an {m,n} quantifier where m is zero, e.g.,

{0,100}
{0,}
{0}

will always succeed because m is the minimum number of repetitions. The {0} quantifier is a pathological case included for completeness.

All that was to show we more or less got lucky with the line1 case. ^ matched the very beginning, the ?-quantified pattern matched nothing, and then the next ^ also matched the very beginning of $data.

Restoring symmetry makes a cleaner pattern:

my $pattern = qr/ (^.+\n)?
                  ^line5\n
                  (^.+\n)?
                /mx;

Greg Bacon 2010-09-04 12:11:51

Answer 4

+1 A:

Using unix command line power is great is such cases and perl embraces it. try something like grep -A 1 or grep -B 1 it will give you the line after/before

Noam 2010-09-04 12:18:52

Oh, and although the solutions above will work, they are a hell of alot harder to code and are not needed in such a case

Noam 2010-09-04 12:20:08

ansaurus

tags:

views:

answers:

How do I match the line before and after a pattern match in Perl?

related questions