views:

1025

answers:

6

I'm increasingly becoming aware that there must be major differences in the ways that regular expressions will be interpreted by browsers.
As an example, a co-worker had written this regular expression, to validate that a file being uploaded would have a PDF extension:

^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.pdf)$

This works in Internet Explorer, and in Google Chrome, but does NOT work in Firefox. The test always fails, even for an actual PDF. So I decided that the extra stuff was irrelevant and simplified it to:

^.+\.pdf$

and now it works fine in Firefox, as well as continuing to work in IE and Chrome.
Is this a quirk specific to asp:FileUpload and RegularExpressionValidator controls in ASP.NET, or is it simply due to different browsers supporting regex in different ways? Either way, what are some of the latter that you've encountered?

+1  A: 

If you're using javascript, not enclosing the regex with slashes causes error in Firefox.

Try doing var regex = /^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.pdf)$/;

Mauricio
+3  A: 

As far as I know firefox doesn't let you have the full path of an upload. Interpretation of regular expressions seems irrelevant in this case. I have yet to see any difference between modern browsers in regular expression execution.

artificialidiot
A: 

I have not noticed a difference between browsers in regards to the pattern syntax. However, I have noticed a difference between C# and Javascript as C#'s implementation allows back references and Javascript's implementation does not.

BoltBait
+4  A: 

Regarding the actual question: The original regex requires the value to start with a drive letter or UNC device name. It's quite possible that Firefox simply doesn't include that with the filename. Note also that, if you have any intention of being cross-platform, that regex would fail on any non-Windows system, regardless of browser, as they don't use drive letters or UNC paths. Your simplified regex ("accept anything, so long as it ends with .pdf") is about as good of a filename check as you're going to get.

However, Jonathan's comment to the original question cannot be overemphasized. Never, ever, ever trust the filename as an adequate means of determining its contents. Or the MIME type, for that matter. The client software talking to your web server (which might not even be a browser) can lie to you about anything and you'll never know unless you verify it. In this case, that means feeding the received file into some code that understands the PDF format and having that code tell you whether it's a valid PDF or not. Checking the filename may help to prevent people from trying to submit obviously incorrect files, but it is not a sufficient test of the files that are received.

(I realize that you may know about the need for additional validation, but the next person who has a similar situation and finds your question may not.)

Dave Sherohman
Your comment about cross-platform compatibility is a solid one, and something the original coder obviously didn't take into account. It's not really an answer to the actual question asked though, nor is the question about sufficient tests for file upload type (see my comment on the question)
Grank
My apologies for trying to provide a broad answer which addresses situations related to that which you specifically addressed, in case they might be relevant to others in the future. (Or to you, given that you didn't state whether you're strictly targeting Windows only or not.)
Dave Sherohman
+1  A: 

As Dave mentioned, Firefox does not give the path, only the file name. Also as he mentioned, it doesn't account for differences between operating systems. I think the best check you could do would be to check if the file name ends with PDF. Also, this doesn't ensure it's a valid PDF, just that the file name ends with PDF. Depending on your needs, you may want to verify that it's actually a PDF by checking the content.

Kibbee
+1  A: 

I believe JavaScript REs are defined by the ECMA standard, and I doubt there are many differences between JS interpreters. I haven't found any, in my programs, or seen mentioned in an article.

Your message is actually a bit confusing, since you throw ASP stuff in there. I don't see how you conclude it is the browser's fault when you talk about server-side technology or generated code. Actually, we don't even know if you are talking about JS on the browser, validation of upload field (you can no longer do it, at least in a simple way, with FF3) or on the server side (neither FF nor Opera nor Safari upload the full path of the uploaded file. I am surprised to learn that Chrome does like IE...).

PhiLho
There are definitely browser differences, though they may be subtle. I'm sorry that you found the message confusing, it is so because I didn't know what the problem was. Turned out to be that FF doesn't provide the full upload path. I don't know why you said that you can't validate that with FF3.
Grank
I meant that in FF3, JavaScript can no longer access the path in the file input field. Something you could do in FF2.Neither version provides full path, because it is, most of the time, useless on the server side, and even a security breach.
PhiLho