tags:

views:

144

answers:

3

Perl (no modules loaded and -Tw & strict) I found much info on regex and pattern matching here but not exactly what I need. I want to know if this is the correct way to validate a couple things. Sorry about the beginner attempt here. I am quite new to this.

my $this = "12345678";

if ($this != m/\b[0-9]{8}\b/x) { print "$this is bad"; }

my $that = "12345678.gif";

if ($that != m/\b[0-9]{8}\.gif\b/x) { print "$that is bad"; }

or

if ($that != m/\b[0-9]{8}\.(jpe?g|gif|png)\b/x) { print "$that is bad"; }


my ($ext) = $that =~ m/\.([^\.]+)$/x;

# verify extension
if  ($ext != m/\.(jpe?g|png|gif)$/x ){ print "$ext is bad"; }

# for content type
if ($ext eq "jpg") {$ext = "jpeg";}

I use the /x because perl::critic indicated I needed it. Passes with /x so...

\d is not an option and should be avoided from what I have read here.

The file name "that" has to be 8 digits + an image type. The other set of digits "this" is actually a folder name. This provides a little error checking for an image serving script. The root htaccess sends calls to images in a specific folder to said script. I grab the directory and image name off via path info.

I appreciating all advice past and present here... I have learned so much.

+2  A: 

Most of what you have above looks fine. A few points:

  • if ($ext != m/pattern/) is wrong - the != operator should be !~
  • \d is just fine, if you're parsing filenames or anything else not likely to be unicode
  • avoid /x unless you really need it (you're not splitting the regexp up over multiple lines for readability). For that matter, avoid any of the flags unless you need them
  • (jpe?g|gif|png) can be modified to (?:jpe?g|gif|png) to disable capturing on that set of parentheses (the efficiency increase is nominal in most cases, but it occasionally can make a difference e.g. in a rapid loop, so I make it a habit to not capture unless I need to)
  • you don't need to escape . inside a character class - i.e. [^\.] can be [^.] (I believe the only character you need to escape is ] itself, but don't take this as gospel) :)
  • it's "Perl", not "PERL" :)
Ether
You would also need to escape a hyphen (-) in a character class unless you put it at the end, a ^ unless it's not at the beginning, or a literal \ when you don't want it to create a double-quotish character or a character class shortcut.
brian d foy
I was told, to keep my script as "secure" as possible, to use [0-9] instead of \d so it would be less likely that an an undesirable gets passed. Thanks for the instructions on all that stuff. I am adding it all to my notes in my personal "Perl" code bible. Thanks so much!
Jim_Bo
Why the downvote?
Ether
+3  A: 

If you are running under taint checking, this isn't the way to do it. You need to match the pattern you need then remember that in a memory variable to clear the taint:

my $this = ...;

my $regex = qr/
    ^             # beginning of string
    (             # start of $1
   [0-9]{8}
   \.
   (gif|jpg)   # extension in $2
    )
    \z            #end of string
    /x;

my( $cleansed, $extension ) = do { 
    if( $this =~ m/$regex/ ) { ( $1, $2 ) }
 else                  { die "Bad filename!" }
 };

I'm not sure why you have a \b at the beginning of your regex. It probably doesn't do what you think it does. If you want the file name to be only the digits, use the ^ beginning of string anchor instead. That way, nothing can come before the digits. Likewise, the end of string anchor \z says that nothing can come after the extension.

If you then need to match an extension to a content-type for an HTTP response, which I'm guessing that your doing, you can use a hash to make the map:

 my %types = (
      jpg => jpeg,
      gif => gif,
      ...
      );

Now that you have the hash, you can use it as another level of validation:

 unless( exists $types{$extension} ) { die "Unsupported type!" }
brian d foy
Thank you. The /\b stuff \b/ was for word boundary. I really am new to this and I wanted to match only what was in between. I am learning though thanks to everyone here.
Jim_Bo
@brian Excellent! So, $cleansed == $1 (filename) and $extension == $2 (obvious) ?? If so, we may have total comprehension here people. Alert the media!
Jim_Bo
@Jim: yes, the last list evaluated in the do is the assigned to the variables.
brian d foy
@brian Ok, stop the press. Not "total" comprehension. If I make another (to check the directory is 8 digits), will that be $3? Also, instead of using ($1, $2) when passing the match, can I just call a sub or assign a var? eg if( $this =~ m/$regex/ ) { $step = "1"; GoTwo(); } I am not planning on that but, just want to verify my understanding of the mechanics. Also, do I have to have any modules turned on to do this like CGI qw/:simple/; ??
Jim_Bo
I think your getting away from what you realy need. It sounds like you just want to ensure the file exists. Instead of building on the current answer, maybe you should think about starting over by explaining what you need to do without thinking about how you want to do it.
brian d foy
I will check if it exists later, -e or -B. I really just want to verify my error checking here with this question. So, instead of putting ($1, $2) between the curlys I can just put something else like a call to an error() sub. My understanding of your above regex example is crucial to my learning curve. Sorry for my slowness. I was in a serious accident a few years back and the old noggin just ain't what she used to be. ;-) So, am here all day, every day trying to learn perl at the expense of everyones patience...
Jim_Bo
A: 

You need to use =~ and !~ instead of == and != for regex matching. Also after removing redundant code and optimizing, I would write it this way.

my $that = "12345678.gif";    
if ($that =~ m/\b[0-9]{8}\.(jpe?g|gif|png)\b/x)
{
    my $ext = $1; 
    if ($ext eq "jpg") {$ext = "jpeg";}
}
else
{
    print "$that is bad";
}
Nikhil
I don't think that \b does what you think it does.
brian d foy
Cool, Thank you!
Jim_Bo
@brian I "thought" it was a word boundary so "this" is only equal to /\bthis\b/ . I really was under that impression. Is how I used it negate its actual design? Thanks...
Jim_Bo
It is a word boundary. But try this: perl -e "$this = 'foo 12345678'; if ($this =~ m/\b[0-9]{8}\b/) { print 'phooey'; }"I think you'll see the problem...
Rini