views:

155

answers:

5

Please help to understand this RegEx statement in details. It's supposed to validate filename from ASP.Net FileUpload control to allow only jpeg and gif files. It was designed by somebody else and I do not completely understand it. It works fine in Internet Explorer 7.0 but not in Firefox 3.6.

<asp:RegularExpressionValidator id="FileUpLoadValidator" runat="server" 
     ErrorMessage="Upload Jpegs and Gifs only." 
     ValidationExpression="^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$"
     ControlToValidate="LogoFileUpload">
</asp:RegularExpressionValidator>
+8  A: 

Here's a short explanation:

^               # match the beginning of the input
(               # start capture group 1
  (             #   start capture group 2
    [a-zA-Z]    #     match any character from the set {'A'..'Z', 'a'..'z'}
    :           #     match the character ':'
  )             #   end capture group 2
  |             #   OR
  (             #   start capture group 3
    \\{2}       #     match the character '\' and repeat it exactly 2 times
    \w+         #     match a word character: [a-zA-Z_0-9] and repeat it one or more times
  )             #   end capture group 3
  \$?           #   match the character '$' and match it once or none at all
)               # end capture group 1
(               # start capture group 4
  \\            #   match the character '\'
  (             #   start capture group 5
    \w          #     match a word character: [a-zA-Z_0-9] 
    [\w]        #     match any character from the set {'0'..'9', 'A'..'Z', '_', 'a'..'z'}
    .*          #     match any character except line breaks and repeat it zero or more times
  )             #   end capture group 5
)               # end capture group 4
(               # start capture group 6
  .             #   match any character except line breaks
  jpg           #   match the characters 'jpg'
  |             #   OR
  .             #   match any character except line breaks
  JPG           #   match the characters 'JPG'
  |             #   OR
  .             #   match any character except line breaks
  gif           #   match the characters 'gif'
  |             #   OR
  .             #   match any character except line breaks
  GIF           #   match the characters 'GIF'
)               # end capture group 6
$               # match the end of the input

EDIT

As some of the comments request, the above is generated by a little tool I wrote. You can download is here: http://www.big-o.nl/apps/pcreparser/pcre/PCREParser.html (WARNING: heavily under development!)

EDIT 2

It will match strings like these:

x:\abc\def\ghi.JPG
c:\foo\bar.gif
\\foo$\baz.jpg

Here's what the groups 1, 4 and 6 match individually:

group 1 | group 4      | group 6
--------+--------------+--------
        |              |
 x:     | \abc\def\ghi | .JPG
        |              |
 c:     | \foo\bar     | .gif
        |              |
 \\foo$ | \baz         | .jpg
        |              |

Note that it also matches a string like c:\foo\bar@gif since the DOT matches any character (except line breaks). And it will reject a string like c:\foo\bar.Gif (capital G in gif).

Bart Kiers
Can I ignorantly ask what tool you used for this?
Skilldrick
Bart K. Could you please post URL that allows to make this kind of parsing?
myforums
+1 detailed! I too would like to know if this was produced by a tool.
Pharabus
Good thing I reloaded so I didn't continue doing this same bloody thing. +++
Will
Bart K. I understand each element of this but not all linked together.
myforums
RegExBuddy does a good job as well as this site http://www.gskinner.com/RegExr/
Xaisoft
@myforums, see my second edit.
Bart Kiers
+1  A: 

It splits a filename into the parts driveletter, path, filename and extension.

Most probably IE uses backslashes while FireFox uses slashes. Try to replace the \\ parts with [\\\/] so the expression will accept both slashes and backslashes.

Matijs
Nope. Swapping \\ with [\\\/] did not help. Still not working in Firefox.
myforums
A: 

From Expresso this is what Expresso says:

///  A description of the regular expression:
///  
///  Beginning of line or string
///  [1]: A numbered capture group. [([a-zA-Z]:)|(\\{2}\w+)\$?]
///      Select from 2 alternatives
///          [2]: A numbered capture group. [[a-zA-Z]:]
///              [a-zA-Z]:
///                  Any character in this class: [a-zA-Z]
///                  :
///          (\\{2}\w+)\$?
///              [3]: A numbered capture group. [\\{2}\w+]
///                  \\{2}\w+
///                      Literal \, exactly 2 repetitions
///                      Alphanumeric, one or more repetitions
///              Literal $, zero or one repetitions
///  [4]: A numbered capture group. [\\(\w[\w].*)]
///      \\(\w[\w].*)
///          Literal \
///          [5]: A numbered capture group. [\w[\w].*]
///              \w[\w].*
///                  Alphanumeric
///                  Any character in this class: [\w]
///                  Any character, any number of repetitions
///  [6]: A numbered capture group. [.jpg|.JPG|.gif|.GIF]
///      Select from 4 alternatives
///          .jpg
///              Any character
///              jpg
///          .JPG
///              Any character
///              JPG
///          .gif
///              Any character
///              gif
///          .GIF
///              Any character
///              GIF
///  End of line or string
///  

Hope this helps, Best regards, Tom.

tommieb75
A: 

You may need to implement server-side validation. Check out this article.

Solving the Challenges of ASP.NET Validation

Also, there are some good online tools for creating or interpreting Regex expressions. but I suspect that the issue isn't with the expression.

Mark Maslar
+3  A: 

This is a bad regex.

^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$

Let's do it part by part.

([a-zA-Z]:)

This requires the file path starts with a driveletter like C:, d:, etc.

(\\{2}\w+)\$?)

\\{2} means the backslash repeated twice (note the \ needs to be escaped), followed by some alphanumerics (\w+), and then maybe a dollar sign (\$?). This is the host part of UNC path.

([a-zA-Z]:)|(\\{2}\w+)\$?)

The | means "or". So either starts with a drive letter or an UNC path. Congratulations for kicking out non-Windows users.

(\\(\w[\w].*))

This should the directory part of the path, but actually is 2 alphanumerics followed by anything except new lines (.*), like \ab!@#*(#$*).

The proper regex for this part should be (?:\\\w+)+

(.jpg|.JPG|.gif|.GIF)$

This means the last 3 characters of the path must be jpg, JPG, gif or GIF. Note that . is not a dot, but matches anything except \n, so a filename like haha.abcgif or malicious.exe\0gif will pass.

The proper regex for this part should be \.(?:jpg|JPG|gif|GIF)$

Together,

^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$

will match

D:\foo.jpg
\\remote$\dummy\..\C:\Windows\System32\Logo.gif
C:\Windows\System32\cmd.exe;--gif

and will fail

/home/user/pictures/myself.jpg
C:\a.jpg
C:\d\e.jpg

The proper regex is /\.(?:jpg|gif)$/i, and check whether the uploaded file is really an image on the server side.

KennyTM
WOW! Thanks a lot for details. This is what I was looking for. Solves my problem. Still curious why original is not working in Firefox. May be a subject for separate question, but probably not very relevant to the main subject here.
myforums
Sorry. Just found that '' does not work for 'C:\doc\My Pictures\cat-fish.gif'
myforums