With the entire file in a scalar, write your pattern so it captures the lines before and after line3
. The /m
modifier is especially useful:
Treat string as multiple lines. That is, change ^
and $
from matching the start or end of the string to matching the start or end of any line anywhere within the string.
The patterns below use the /x
modifier that lets us add whitespace to make them look like what they're matching.
For example:
#! /usr/bin/perl
my $data = do { local $/; <DATA> };
my $pattern = qr/ ^(.+\n)
^line3\n
^(.+\n)
/mx;
if ($data =~ /$pattern/) {
print $1, $2;
}
else {
print "no match\n";
}
__DATA__
line1
line2
line3
line4
line5
Output:
line2
line4
Remember that $
is an assertion: it doesn't consume any characters, so you have to match newline with a literal \n
pattern.
Also note that the pattern above lacks generality. It works fine for a line somewhere in the middle, but it will fail if you change line3
to line1
or line5
.
For the line1
case, you could make the previous line optional with a ?
quantifier:
my $pattern = qr/ ^(.+\n)?
^line1\n
^(.+\n)
/mx;
As expected, this produces output of
line2
But trying the same fix for the line5
case
my $pattern = qr/ ^(.+\n)?
^line5\n
^(.+\n)?
/mx;
gives
no match
This is because after the final newline in the file (the one following line5
), ^
has nowhere to match, but changing the pattern to
my $pattern = qr/ ^(.+\n)?
^line5\n
(^.+\n)?
/mx;
outputs
line4
We might stop here, but the asymmetry in the pattern is unpleasing. Why did work for one case and not for the other? With line1
, ^
matches the beginning of $data
and then matches nothing for (.+\n)?
.
Remember: patterns quantified with ?
or *
always succeed because they're semantically the same as
- zero times or one time
- zero or more times
respectively, and anything can match zero times:
$ perl -le 'print scalar "abc" =~ /(?!)*/'
1
Although I can't think of a time I've ever seen it used this way, an {m,n}
quantifier where m is zero, e.g.,
will always succeed because m is the minimum number of repetitions. The {0}
quantifier is a pathological case included for completeness.
All that was to show we more or less got lucky with the line1
case. ^
matched the very beginning, the ?
-quantified pattern matched nothing, and then the next ^
also matched the very beginning of $data
.
Restoring symmetry makes a cleaner pattern:
my $pattern = qr/ (^.+\n)?
^line5\n
(^.+\n)?
/mx;