ansaurus

Question

Answer 1

+6 A:

From the perlre manpage:

You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any newline within the string (except if the newline is the last character in the string), and "$" will match before any newline. At the cost of a little more overhead, you can do this by using the /m modifier on the pattern match operator. (Older programs did this by setting $*, but this practice has been removed in perl 5.9.)

So make sure that $* or perhaps other predefined variables do not interfere.

That said, even a $* modified expression like /abcd^$/im (note the added "m" flag) will not match anything because the "^" character only matches after a newline.

Also, make sure that the regex isn't overloaded. If an imported package does something like this:

use overload;                                                                   
sub import {                                                                    
    overload::constant(qr => sub { $_ = shift; s/^abcd//; $_ });                 
}

Then empty strings will match your regex.

Also, don't know if that's how the regex appears in your code, and it may not be relevant, but just to be on the safe side, you shouldn't implicitly match $_ but rather specify the variable explicitly: "$str =~ /abcd^$/i;".

$_ is dynamically scoped, so if you have any function calls that may modify $_ between where you define it and where the regexp is, or if you add them later, you'll be in for a surprise :)

Inshallah 2009-07-19 14:09:05

I added a note that this is existing code I am trying to get removed. However, along these lines, I did try injecting a \n witout any luck so far in having this allow me to pass this check.

RandomNickName42 2009-07-19 14:12:37

If it's running with an older version of the interpreter, it could still possibly evaluate to true, unless you are positive that $* will never be modified. Please note that the absence of a literal $* is no indication of that because $str="*"; ${$str} = 1; will still modify $*.

Inshallah 2009-07-19 14:18:19

I'll fwd this to the test manager to see if we can shake something out and post a follow-up.

RandomNickName42 2009-07-19 14:21:54

Answer 2

+1 A:

/abcd^$/i is the same as /abcd^$/im, if $* was set to true (in Perl prior to 5.9).

I would re-write it /abcd$^$/im.

Basically what it does is look for 'abcd' at the end of a line, followed by a blank line.

Except that there needs to be something before '^' that captures the newline.

Brad Gilbert 2009-07-19 17:01:15

Neither /abcd^$/m nor /abcd$^$/m will match any of these: "abcd\n", "abcd\n\n", "abcd", "". The best I can come up with to make it match is /abcd.^$/ms ("s" to make "." match a newline). The "^" really only matches after a newline.

Inshallah 2009-07-19 17:27:34

I assume that `abcd`, is just standing in for the actual pattern. Which may actually be able to match a newline.

Brad Gilbert 2009-07-19 17:54:34

You are right, that is possible, but the questioner would have pointed it out I think. Tried out "$/", but doesn't seem to change anything.

Inshallah 2009-07-19 18:04:18

Answer 3

+4 A:

What's the intent of that regular expression? Maybe it's not doing the job correctly and we can fix that for you. What sort of data is it trying to match? Is it possible that the original coder was trying to match a literal ^? Which situations does it guard against?

In these sorts of situations, I find it's better to figure out what should be happening in the code rather than what actually is happening. The intent might be right but the implementation wrong. Bugs do happen. :)

You might consider adding a logging statement in the code it guards to see if it is ever triggered. With all of the special variables and overloading involved, you might not be able to merely look at the the regex and figure out what it will do. If you see it triggered, you know you still need it. If it's never triggered, well, you still don't know.

brian d foy 2009-07-19 18:13:01

Answer 4

+3 A:

BTW, I thought I would point out use re 'debug' here. You can use it to see how Perl is compiling and matching your regexes:

$ perl -Mre=debugcolor -e '/abcd^$/'
Compiling REx "abcd^$"
Final program:
   1: EXACT <abcd> (3)
   3: BOL (4)
   4: EOL (5)
   5: END (0)
anchored "abcd"$ at 0 (checking anchored) minlen 4
Freeing REx: "abcd^$"

With m:

$ perl -Mre=debugcolor -e '/abcd^$/m'
Compiling REx "abcd^$"
Final program:
   1: EXACT <abcd> (3)
   3: MBOL (4)
   4: MEOL (5)
   5: END (0)
anchored "abcd"$ at 0 (checking anchored) minlen 4
Freeing REx: "abcd^$"

You can also try some sample data and be sure that nothing is matching:

$ perl -Mre=debugcolor -e '"not going to match" =~ /abcd^$/m'
Compiling REx "abcd^$"
Final program:
   1: EXACT <abcd> (3)
   3: MBOL (4)
   4: MEOL (5)
   5: END (0)
anchored "abcd"$ at 0 (checking anchored) minlen 4
Guessing start of match in sv for REx "abcd^$" against "not going to match"
Did not find anchored substr "abcd"$...
Match rejected by optimizer
Freeing REx: "abcd^$"

Here the match fails twice:

$ perl -Mre=debug -e '"abcd\nabcd\n\n" =~ /abcd^$/m'
...
anchored "abcd"$ at 0 (checking anchored) minlen 4
Guessing start of match in sv for REx "abcd^$" against "abcd%nabcd%n%n"
Found anchored substr "abcd"$ at offset 0...
Guessed: match at offset 0
Matching REx "abcd^$" against "abcd%nabcd%n%n"
   0 <> <abcd%nabcd>         |  1:EXACT <abcd>(3)
   4 <abcd> <%nabcd%n%n>     |  3:MBOL(4)
                                  failed...
   5 <abcd%n> <abcd%n%n>     |  1:EXACT <abcd>(3)
   9 <abcd%nabcd> <%n%n>     |  3:MBOL(4)
                                  failed...
Match failed
Freeing REx: "abcd^$"

Try running this yourself, as it's clearer when the color from debugcolor is used.

There is a man page.

jrockway 2009-07-21 09:49:36

Answer 5

+1 A:

The perlre documentation states

Embedded newlines will not be matched by ^ or $.

Literal /abcd^$/ can never match because ^ matches only at the beginning of the string or after a newline in multiline mode, so ^$ at the end of the pattern requires help getting past an embedded newline.

With an older perl, similar patterns can match:

$ cat prog
#! /usr/local/bin/perl -w

$* = 1;
$_ = "AbC\n\n";
print /abc\n^$/i  ? "Match.\n" : "No match.\n";
print /abc\s*^$/i ? "Match.\n" : "No match.\n";

$ ./prog
Use of $* is deprecated at ./prog line 3.
Match.
Match.

Note the deprecation warning from ancient perl-5.6.1, and the 5.10.0 release removed support for $*. It's possible, but on the pathological side.

Greg Bacon 2010-01-02 04:10:48

ansaurus

tags:

views:

answers:

Does /abcd^$/i match anything in Perl?

related questions