views:

65

answers:

4

I'm using Fancy Upload 3 and onSelect of a file I need to run a check to make sure the user doesn't have any bad characters in the filename. I'm currently getting people uploading files with hieroglyphics and such in the names.

What I need is to check if the filename only contains:

  1. A-Z
  2. a-z
  3. 0-9
  4. _ (underscore)
  5. - (minus)
  6. SPACE
  7. ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü (as single and double byte)

Obviously you can see the difficult thing there. The non-english single and double byte chars.

I've seen this:

[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]

And this:

[\x80-\xA5]

But neither of them fully cover the situation right.

Examples that should work:

  1. fást.zip
  2. abc.zip
  3. ABC.zip
  4. Über.zip

Examples that should NOT work:

  1. ∑∑ø∆.zip
  2. ¡wow!.zip
  3. •§ªº¶.zip

The following is close, but I'm NO RegEx'pert, not even close.

var filenameReg = /^[A-Za-z0-9-_]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF]+$/;

Thanks in advance.


Solution from Zafer mostly works, but it does not catch all of the other symbols, see below.

Uncaught:

¡£¢§¶ª«ø¨¥®´åß©¬æ÷µç

Caught:

™∞•–≠'"πˆ†∑œ∂ƒ˙∆˚…≥≤˜∫√≈Ω

Regex:

var filenameReg = /^([A-Za-z0-9\-_. ]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])+$/;
A: 

The following should work:

var filenameReg = /^([A-Za-z0-9\-_. ]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])+$/;

I've put \ next to - and grouped two expressions otherwise + sign doesn't affect the first expression.

EDIT 1 :I've also put . in the expression.

Zafer
The problem is this does not accept double byte chars. Like typing Alt+e and then e to get é (´ + e). Otherwise this works for the rest.
Tomas
Yes, it works. Check http://jsfiddle.net/Muyec/ . At least tests pass. :)
Zafer
Ok, I think I know why my tests did not work. I think it's because my javascript was NOT in a separate file and encoding was wrong because of the html doc. Let me check that as your example does work.
Tomas
Yes, encoding may be the issue. Changing file encoding to UTF8 may help.
Zafer
Hmm, encoding is utf-8. Checking a couple other things.
Tomas
In case it helps anyone. It makes a difference where the value is coming from. For some reason pulling the value from an input field yielded slightly different results, but when applied to the upload script and evaluating the filename that's input it worked perfectly.
Tomas
Actually, I just tested a bunch of others and this is not catching many characters.Uncaught: ¡£¢§¶ª«ø¨¥®´åß©¬æ÷µçCaught: ™∞•–≠'"πˆ†∑œ∂ƒ˙∆˚…≥≤˜∫√≈Ω
Tomas
+1  A: 

Alternation between two character classes (ie. [abc]|[def]) can be simplified to a single character class ([abcdef]) -- the first can be read as "(a or b or c) OR (d or e or f)"; the second as "(a or b or c or d or e or f)". What probably tripped up your regular expression is the unescaped dash in the first class -- if you want a literal dash, it should be the last character in the class.

So we'll modify your expression to get it working:

var filenameReg = /^[A-Za-z0-9_\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF-]+$/;

The problem now is that you're not accounting for the file extension, but that is an easy modification (assuming you're always getting .zip files):

var filenameReg = /^[A-Za-z0-9_\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF-]+\.zip$/;

Replace zip with another pattern if the extension differs.

Daniel Vandersluis
A: 

We have diffrent rules for diffrent platforms. But I think you mean long file names in windows. For that you can use following RegEx:

var longFilenames = @"^[^\./:*\?\""<>\|]{1}[^\/:*\?\""<>\|]{0,254}$";

NOTE: Instead of saying which Character is allowed, you need to say which ones are not allowed!

But keep in mind that this is not 100% complete RegEx. If you really want to make it complete you have to add exceptions for reserved names as well.

You can find more information about filename rules here:

http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx

Qorbani
No, I'm not caring about long filenames. I care about removing useless characters like £¡•ª¶∆˙√µ‘«Ω≈. Thanks though.
Tomas
A: 

It looks like it is the character ranges that are causing the problem, because they include some unallowable characters in between. Since you already have the list of allowable characters, the best thing would be to just use that directly:

var filenameReg = /^[A-Za-z0-9_\-\ ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü]+$/;
casablanca
I was worried about the double and single byte chars that are the same. But I think since my page is utf-8 encoded there shouldn't be any chance of that, right?
Tomas
Right. The age of single and double byte characters is long gone. Unicode takes care of all these automatically. Also, FYI, JavaScript strings are UTF-16 regardless of your page encoding.
casablanca
I think this is the best solution as the ranges are incomplete and encoding shouldn't be an issue.
Tomas