views:

669

answers:

2

Well, I tried and failed so, here I am again.

I need to match my abs path pattern.

 /public_html/mystuff/10000001/001/10/01.cnt

I am in taint mode etc..

#!/usr/bin/perl -Tw
use CGI::Carp qw(fatalsToBrowser);
use strict;
use warnings;
$ENV{PATH} = "bin:/usr/bin";
delete ($ENV{qw(IFS CDPATH BASH_ENV ENV)});

I need to open the same file a couple times or more and taint forces me to untaint the file name every time. Although I may be doing something else wrong, I still need help constructing this pattern for future reference.

my $file = "$var[5]";
if ($file =~ /(\w{1}[\w-\/]*)/) {
$under = "/$1\.cnt";
} else {
ErroR();
}

You can see by my beginner attempt that I am close to clueless.

I had to add the forward slash and extension to $1 due to my poorly constructed, but working, regex.

So, I need help learning how to fix my expression so $1 represents /public_html/mystuff/10000001/001/10/01.cnt

Could someone hold my hand here and show me how to make:

$file =~ /(\w{1}[\w-\/]*)/ match my absolute path /public_html/mystuff/10000001/001/10/01.cnt ?

Thanks for any assistance.

+6  A: 

Edit: Using $ in the pattern (as I did before) is not advisable here because it can match \n at the end of the filename. Use \z instead because it unambiguously matches the end of the string.

Be as specific as possible in what you are matching:

my $fn = '/public_html/mystuff/10000001/001/10/01.cnt';

if ( $fn =~ m!
    ^(
        /public_html
        /mystuff
        /[0-9]{8}
        /[0-9]{3}
        /[0-9]{2}
        /[0-9]{2}\.cnt
     )\z!x ) {
     print $1, "\n";
 }

Alternatively, you can reduce the vertical space taken by the code by putting the what I assume to be a common prefix '/public_html/mystuff' in a variable and combining various components in a qr// construct (see perldoc perlop) and then use the conditional operator ?::

#!/usr/bin/perl

use strict;
use warnings;

my $fn = '/public_html/mystuff/10000001/001/10/01.cnt';
my $prefix = '/public_html/mystuff';
my $re = qr!^($prefix/[0-9]{8}/[0-9]{3}/[0-9]{2}/[0-9]{2}\.cnt)\z!;

$fn = $fn =~ $re ? $1 : undef;

die "Filename did not match the requirements" unless defined $fn;
print $fn, "\n";

Also, I cannot reconcile using a relative path as you do in

$ENV{PATH} = "bin:/usr/bin";

with using taint mode. Did you mean

$ENV{PATH} = "/bin:/usr/bin";
Sinan Ünür
Thank you @Sinan, I was not aware that the $ENV{PATH} was not correct. Everything was working properly but, that may have been an issue in the future. Thanks. I was real close on that pattern too in one attempt! I left out the $!x on the end and I gave up in frustration, adapting the one in my question. Thanks again.
Jim_Bo
@Jim_Bo: I thought this was less cluttered but I'll combine the two.
Sinan Ünür
@Sinan, could you also put your original answer up there as well. Was a little more versose but, helpful in my learning curve. Thanks.
Jim_Bo
THANKS @Sinan ;-)
Jim_Bo
The only suggestion I would have is to use File::Spec along with what you're already doing in this example.
genio
@genio, I luv modules but, in this case I wanted to avoid using any modules and keep it as small as possible. I would even like to get rid of the CGI module if I can. I am unaware if I can. I will try though.
Jim_Bo
@Jim_Bo: Don't get rid of CGI. The last thing you should be doing is trying to reinvent its functionality. However, for a cleaner, OO module, use http://search.cpan.org/perldoc/CGI::Simple
Sinan Ünür
@Sinan, I checked out cgi::simple thanks for the tip. I installed it. I was however, wondering why I would need to load any module for this "simple" script. Does using "CGI::Simple qw(-carp);" (if that is even the correct format?) offer more "graceful" failures or added security without actually calling it in the script somewhere? All this sub does is open a file and record a hit. I use "CGI::CARP -qw(fatals to browser);" only to view my failures. Am I wrong in removing it after everything "works"? I do not want to display any errors to the browser after that.
Jim_Bo
Well, you did not show what the program did. However, read http://faq.perl.org/perlfaq5.html#I_still_don_t_get_lo And, yes, you should not send errors to the browser in production.
Sinan Ünür
@Sinan.. Yeah, I need to be more specific in my questions. I am getting better but, I need to mention if my question refers to a stand alone or is a sub in a larger script. Sorry about that. Thanks for the link too. This is not actually a "page hit counter" though but, a logger that indicates if in fact an image is being hotlinked. The "count" is just there for a basis of comparison in function. It is called by my htaccess file. I use my servers statistics gathering for more relevant data. This script is just a little stand alone for a friend.
Jim_Bo
@Sinan.. I also would like to mention that the second version of your snippet above rocks.. I used a modified (pattern) version of it as an untaint sub in another script I was working on. The first version explained the mechanics to me while the second showed me the light. Works great! THANK U! Perl gives you everything, how you use it is limited only to your imagination which is directly contingent upon your knowledge.
Jim_Bo
@Jim_Bo you are welcome. Glad it worked.
Sinan Ünür
+6  A: 

You talk about untainting the file path every time. That's probably because you aren't compartmentalizing your program steps.

In general, I break up these sort of programs into stages. One of the earlier stages is data validation. Before I let the program continue, I validate all the data that I can. If any of it doesn't fit what I expect, I don't let the program continue. I don't want to get half-way through something important (like inserting stuff into a database) only to discover something is wrong.

So, when you get the data, untaint all of it and store the values in a new data structure. Don't use the original data or the CGI functions after that. The CGI module is just there to hand data to your program. After that, the rest of the program should know as little about CGI as possible.

I don't know what you are doing, but it's almost always a design smell to take actual filenames as input.

brian d foy