tags:

views:

657

answers:

6

Hi all,

I need to sanitize some data which will be used in file names. Some of the data contains spaces and ampersand characters. Is there a function which will escape or sanitize data suitable for using in a file name (or path)? I couldn't find one in the 'Filesystem Function' section of the PHP manual.

So, assuming I have to write my own function, which characters do I need to escape (or change)?

+7  A: 

For Windows:

/ \ : * ? " < > |

For Unix, technically nothing, but in practice the same list as Windows would be sensible.

There's nothing wrong with spaces or ampersands as long as you're prepared to use quotes on command lines when you're manipulating the files.

(BTW, I got that list by trying to rename a file on Windows to something including a colon, and copying from the error message.)

RichieHindle
+2  A: 

When sanitizing strings for filenames, we filter out all characters below 0x20, as well as <, >, :, ", /, \, |, ?, and *

garethm
+2  A: 

For Windows, add "&" to the list, if you don't want -any- side-effects. This is the character which says "the next character is my hotkey" in some displays of data. (Most common in old Windows, but still pops up here and there.) So instead of "M & M" you'd see "M _M" ... the character following the ampersand (a space) is a "hotkey", and thus underlined.

DreadPirateShawn
+2  A: 

It might be a good idea to remove everything outside [a-z0-9_\-.]. It's not necessary to be this strict, but it's comfortable to have a directory listing without any surprises. If you're working with some weird character sets, then you maybe want to convert the encoding to flat ascii before removing the offending characters (or you might end up with deleting everything) ...

at least that's how i do it :-)

cube
+1  A: 

Instead of filtering out characters why not just allow [a-z0-9- !@#$%^()]? It is certainly easier than trying to guess every character that could potentially cause problems.

Your users shouldn't need a file with any other characters anyways, right?

Nick Presta
+3  A: 

If you have the opportunity to store the original name in a database I would simply create a file with a random hash (mt_rand()/md5/sha1). The benefit would be that you don't rely on the underlying OS (characters/path length), the value or the length of the user input and additionally it is really hard to guess/forge a file name. Maybe even a base64 encoding is an option.

merkuro