views:

175

answers:

5

I have a PHP application.

I allow users to upload files to my web application.

Question: What's the best way for me to sanitize the file names of the uploaded documents $_FILES["filename"]["tmp_name"] in PHP?

UPDATE:

Can I take an MD5 of the uploaded filename and use that as the newly assigned filename? If so, how do I do that in PHP?

+1  A: 

I would just run a simple regex that replaces any non alphanumeric characters with an underscore (or just remove these character altogether). Make sure you preserve the extension of course.

If you want to go a bit further, you could use magic mime extension to ensure the file is the same format that the extension says it is.

EDIT: To avoid filename collisions in a directory, you could append a md5 of users IP + current time to the filename.

Sam Day
What happens if they upload 2 documents at the same time?
Add some additional entropy, use a counter for each file you process:
Sam Day
If $i is incremented for each file you process:`$filename = $sanitizedFileName . md5($_SERVER["REMOTE_ADDR"] . time() . $i) . $extension;`
Sam Day
A: 

If you're not against losing the actual filenames, what I usually do is create a hash of the filename and set the filename to that, if whatever you're developing has loads of pictures being uploaded it helps avoid conflicts where two filenames are named alike and overwrites occur.

hash('md5', $_FILES["filename"]["tmp_name"]);
jduren
Do you mean: **hash('md5', $_FILES["filename"]["tmp_name"] )** ?
Yes, the first argument you pass is the type of hash you would like to create and the second argument is the string you want to create the hash from. PHP Hash - http://www.php.net/manual/en/function.hash.php
jduren
I also would suggest adding Sam Days practice as well where you could add the current time to the filename before hashing, that would create an even more unique filename.
jduren
If you're going the 'just have a (pseudo)random filename'-route, using the `tempnam()`-function will automatically solve race-conditions.
Wrikken
You'd definitely need to add something before hashing. Hashing the filename alone won't prevent name collisions, because the same filename will always produce the same hash.
Ben Dunlap
@jduren, please update your anser to be: **hash('md5', $_FILES["filename"]["tmp_name"] )**. If you update the answer, I'll mark is a "accepted"
@Wrikken tempnam() actually creates a new file altogether doesn't it? Using a hashed unique string then just renaming the file would be more ideal.
jduren
No, that's the tmp filename PHP uses to when a user upload a file to the server.
@user401839 No, Wrikken mentioned a tempnam() function in his comment (4th comment down on my answer) and I am wondering in terms of performance, is using tempnam() which creates a new file altogether better versus just renaming the file with a hash of the original filename with the time or some other unique data appended.
jduren
Wrikken
@Wikkan any chance you can provide a useage example where you would use tempnam() to deal with this question for instance. I read a little off the PHP Man page for the tempnam() function and just don't see how tempnam is useful in this instance. As far as I can tell it creates a a temporary file with the option to set a prefix. But it creates it with the .tmp extension then returns the filename? Creating an entire new file just to get a unique string seems a bit overkill but I may be interpreting it incorrectly. Example?
jduren
Wrikken
A: 

I bet that you also store some information about the file in the database. If this is correct, then you can use the primary key (ID) as a filename on your server and preserve the original filename in the database. This gives you greater flexibility, because you can manipulate the metadata without renaming the actual file.

Adam Byrtek
+1  A: 

To avoid filename collision just check whether given or generated filename doesn't already exists:

do {
   // Generate filename, eg.:
   $filename = md5(uniqid()) . $fileExtension;
} while (file_exists($filename));

That gives you 100% sure that the filename is unique. Using md5 (or any other hash algorithm) ensures you that the filename is secure - and easy to handle.

Crozin
This makes no sense. You're taking the MD5 of the uniqid function. Why?
That's only example. You can use original filename etc. The point is that you should check in loop whether generated filename is already in use. If so regenerate filename.
Crozin
A: 

Instead of sanitizing filenames specified by the user, use any other unique identifier for that photo and store that as the filename. I prefer using user ID's which are numeric and always unique.

move_uploaded_file($_FILES["tmp_name"],"/home/yourname/".$user_id));

You can then fetch the image from any location (say, S3 or even your own server) by just knowing the user's ID. You don't even need a attribute in your database to store image URL's.

Gaurav Gupta