tags:

views:

351

answers:

5

Perl is one of the things I never quite had the justification to get into. Unfortunately, I've got a very specific (looks like a bug to me) bit of Perl code, and I need to define it's operation provably.

This code is written and in production already, I wish to have it removed.

I believe it's impossible to successfully match, but it's one of those things that a guess on the subject (especially mine) is not good enough. It is guarding a code block {} which, unfortunately has multiple conditions, so can be entered regardless of the state of this expression, but there is a security issue if this does happen (a bug in itself if the state is undefined IMHO, yet w/o a proof the impact/severity is lower (read never going to get fixed)).

Is it possible to /abcd^$/i to ever, match successfully? Including null byte insertion's or any possible byte/binary data? I'd even go with some insane environmental attack (e.g. consuming all of the memory on the host within 1 byte, causing the Perl expression evaluator to fail an unchecked 2 byte allocation in the run-time). Kudos for creativity.

+6  A: 

From the perlre manpage:

You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any newline within the string (except if the newline is the last character in the string), and "$" will match before any newline. At the cost of a little more overhead, you can do this by using the /m modifier on the pattern match operator. (Older programs did this by setting $*, but this practice has been removed in perl 5.9.)

So make sure that $* or perhaps other predefined variables do not interfere.

That said, even a $* modified expression like /abcd^$/im (note the added "m" flag) will not match anything because the "^" character only matches after a newline.


Also, make sure that the regex isn't overloaded. If an imported package does something like this:

use overload;                                                                   
sub import {                                                                    
    overload::constant(qr => sub { $_ = shift; s/^abcd//; $_ });                 
}

Then empty strings will match your regex.


Also, don't know if that's how the regex appears in your code, and it may not be relevant, but just to be on the safe side, you shouldn't implicitly match $_ but rather specify the variable explicitly: "$str =~ /abcd^$/i;".

$_ is dynamically scoped, so if you have any function calls that may modify $_ between where you define it and where the regexp is, or if you add them later, you'll be in for a surprise :)

Inshallah
I added a note that this is existing code I am trying to get removed. However, along these lines, I did try injecting a \n witout any luck so far in having this allow me to pass this check.
RandomNickName42
If it's running with an older version of the interpreter, it could still possibly evaluate to true, unless you are positive that $* will never be modified. Please note that the absence of a literal $* is no indication of that because $str="*"; ${$str} = 1; will still modify $*.
Inshallah
I'll fwd this to the test manager to see if we can shake something out and post a follow-up.
RandomNickName42
+1  A: 

/abcd^$/i is the same as /abcd^$/im, if $* was set to true (in Perl prior to 5.9).

I would re-write it /abcd$^$/im.

Basically what it does is look for 'abcd' at the end of a line, followed by a blank line.

Except that there needs to be something before '^' that captures the newline.

Brad Gilbert
Neither /abcd^$/m nor /abcd$^$/m will match any of these: "abcd\n", "abcd\n\n", "abcd", "". The best I can come up with to make it match is /abcd.^$/ms ("s" to make "." match a newline). The "^" really only matches after a newline.
Inshallah
I assume that `abcd`, is just standing in for the actual pattern. Which may actually be able to match a newline.
Brad Gilbert
You are right, that is possible, but the questioner would have pointed it out I think. Tried out "$/", but doesn't seem to change anything.
Inshallah
+4  A: 

What's the intent of that regular expression? Maybe it's not doing the job correctly and we can fix that for you. What sort of data is it trying to match? Is it possible that the original coder was trying to match a literal ^? Which situations does it guard against?

In these sorts of situations, I find it's better to figure out what should be happening in the code rather than what actually is happening. The intent might be right but the implementation wrong. Bugs do happen. :)

You might consider adding a logging statement in the code it guards to see if it is ever triggered. With all of the special variables and overloading involved, you might not be able to merely look at the the regex and figure out what it will do. If you see it triggered, you know you still need it. If it's never triggered, well, you still don't know.

brian d foy
+3  A: 

BTW, I thought I would point out use re 'debug' here. You can use it to see how Perl is compiling and matching your regexes:

$ perl -Mre=debugcolor -e '/abcd^$/'
Compiling REx "abcd^$"
Final program:
   1: EXACT <abcd> (3)
   3: BOL (4)
   4: EOL (5)
   5: END (0)
anchored "abcd"$ at 0 (checking anchored) minlen 4
Freeing REx: "abcd^$"

With m:

$ perl -Mre=debugcolor -e '/abcd^$/m'
Compiling REx "abcd^$"
Final program:
   1: EXACT <abcd> (3)
   3: MBOL (4)
   4: MEOL (5)
   5: END (0)
anchored "abcd"$ at 0 (checking anchored) minlen 4
Freeing REx: "abcd^$"

You can also try some sample data and be sure that nothing is matching:

$ perl -Mre=debugcolor -e '"not going to match" =~ /abcd^$/m'
Compiling REx "abcd^$"
Final program:
   1: EXACT <abcd> (3)
   3: MBOL (4)
   4: MEOL (5)
   5: END (0)
anchored "abcd"$ at 0 (checking anchored) minlen 4
Guessing start of match in sv for REx "abcd^$" against "not going to match"
Did not find anchored substr "abcd"$...
Match rejected by optimizer
Freeing REx: "abcd^$"

Here the match fails twice:

$ perl -Mre=debug -e '"abcd\nabcd\n\n" =~ /abcd^$/m'
...
anchored "abcd"$ at 0 (checking anchored) minlen 4
Guessing start of match in sv for REx "abcd^$" against "abcd%nabcd%n%n"
Found anchored substr "abcd"$ at offset 0...
Guessed: match at offset 0
Matching REx "abcd^$" against "abcd%nabcd%n%n"
   0 <> <abcd%nabcd>         |  1:EXACT <abcd>(3)
   4 <abcd> <%nabcd%n%n>     |  3:MBOL(4)
                                  failed...
   5 <abcd%n> <abcd%n%n>     |  1:EXACT <abcd>(3)
   9 <abcd%nabcd> <%n%n>     |  3:MBOL(4)
                                  failed...
Match failed
Freeing REx: "abcd^$"

Try running this yourself, as it's clearer when the color from debugcolor is used.

There is a man page.

jrockway
+1  A: 

The perlre documentation states

Embedded newlines will not be matched by ^ or $.

Literal /abcd^$/ can never match because ^ matches only at the beginning of the string or after a newline in multiline mode, so ^$ at the end of the pattern requires help getting past an embedded newline.

With an older perl, similar patterns can match:

$ cat prog
#! /usr/local/bin/perl -w

$* = 1;
$_ = "AbC\n\n";
print /abc\n^$/i  ? "Match.\n" : "No match.\n";
print /abc\s*^$/i ? "Match.\n" : "No match.\n";

$ ./prog
Use of $* is deprecated at ./prog line 3.
Match.
Match.

Note the deprecation warning from ancient perl-5.6.1, and the 5.10.0 release removed support for $*. It's possible, but on the pathological side.

Greg Bacon