I am editing a Perl file, but I don't understand this regexp comparison. Can someone please explain it to me?
if ($lines =~ m/(.*?):(.*?)$/g) { } ..
What happens here? $lines
is a line from a text file.
I am editing a Perl file, but I don't understand this regexp comparison. Can someone please explain it to me?
if ($lines =~ m/(.*?):(.*?)$/g) { } ..
What happens here? $lines
is a line from a text file.
(.*?)
captures any characters, but as few of them as possible.
So it looks for patterns like <something>:<somethingelse><end of line>
, and if there are multiple :
in the string, the first one will be used as the divider between <something>
and <somethingelse>
.
Break it up into parts:
$lines =~ m/ (.*?) # Match any character (except newlines)
# zero or more times, not greedily, and
# stick the results in $1.
: # Match a colon.
(.*?) # Match any character (except newlines)
# zero or more times, not greedily, and
# stick the results in $2.
$ # Match the end of the line.
/gx;
So, this will match strings like ":"
(it matches zero characters, then a colon, then zero characters before the end of the line, $1
and $2
are empty strings), or "abc:"
($1 = "abc"
, $2
is an empty string), or "abc:def:ghi"
($1 = "abc"
and $2 = "def:ghi"
).
And if you pass in a line that doesn't match (it looks like this would be if the string does not contain a colon), then it won't process the code that's within the brackets. But if it does match, then the code within the brackets can use and process the special $1
and $2
variables (at least, until the next regular expression shows up, if there is one within the brackets).
That line says to perform a regular expression match on $lines
with the regex m/(.*?):(.*?)$/g
. It will effectively return true
if a match can be found in $lines
and false
if one cannot be found.
An explanation of the =~
operator:
Binary "=~" binds a scalar expression to a pattern match. Certain operations search or modify the string $_ by default. This operator makes that kind of operation work on some other string. The right argument is a search pattern, substitution, or transliteration. The left argument is what is supposed to be searched, substituted, or transliterated instead of the default $_. When used in scalar context, the return value generally indicates the success of the operation.
The regex itself is:
m/ #Perform a "match" operation
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
: #Match a literal colon character
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
$ #Match the end of string
/g #Perform the regex globally (find all occurrences in $line)
So if $lines
matches against that regex, it will go into the conditional portion, otherwise it will be false
and will skip it.
There is a tool to help understand regexes: YAPE::Regex::Explain.
Ignoring the g
modifier, which is not needed here:
use strict;
use warnings;
use YAPE::Regex::Explain;
my $re = qr/(.*?):(.*?)$/;
print YAPE::Regex::Explain->new($re)->explain();
__END__
The regular expression:
(?-imsx:(.*?):(.*?)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
See also perldoc perlre.
It was written by someone who either knows too much about regular expressions or not enough about the $'
and $` variables.
THis could have been written as
if ($lines =~ /:/) {
... # use $` ($PREMATCH) instead of $1
... # use $' ($POSTMATCH) instead of $2
}
or
if ( ($var1,$var2) = split /:/, $lines, 2 and defined($var2) ) {
... # use $var1, $var2 instead of $1,$2
}