views:

47

answers:

2

Hello All,

Hope you're having great day/night.

I'm creating file upload script and I'm looking for the best techniques and practices to validate uploaded files.

Allowed extensions are:

$allowed_extensions = array('gif','jpg','png','swf','doc','docx','pdf','zip','rar','rtf','psd');

Here's the list of what I'm doing.

  1. Checking file extension

    $path_info = pathinfo($filename); if( !in_array($path_info['extension'], $allowed_extensions) ) { die('File #'.$i.': Incorrent file extension.'); }

  2. Checking file mime type

    $allowed_mimes = array('image/jpeg','image/png','image/gif','text/richtext','multipart/x-zip','application/x-shockwave-flash','application/msword','application/pdf','application/x-rar-compressed','image/vnd.adobe.photoshop'); if( !in_array(finfo_file($finfo, $file), $allowed_mimes) ) { die('File #'.$i.': Incorrent mime type.'); }

  3. Checking file size.

What should I do to make sure uploaded files are valid files? I noticed strange thing. I changed .jpg file extension to .zip and... it was uploaded. I thought it will have incorrect mime type but after than I noticed I'm not checking specific mime, but if specific mime exist in array. I'll fix it later, that would be no problems for me (of course if you got any good solution/idea, do not hesitate to share it, please).

I know what to do with images (try to resize, rotate, crop, etc.). But have no idea how to validate other extensions.

Now's time for my questions. 1. Do you know good techniques to validate such files? Maybe I should unpack archives for .zip/.rar files, but what about documents (doc, pdf)? 2. Will rotating, resizing work for .psd files? 3. Basically I thought that .psd file has following mime: application/octet-stream but when I tried to upload .psd file it showed me (image/vnd.adobe.photoshop). I'ma bit confused about this. Now's my question. Do files always have the same mimetype?

Hope I asked about everything I wanted.

PS. Cannot force code block to work, any ideas? :)

Thanks in advance, Tom

+1  A: 

If you want to validate images, a good thing to do is use getimagesize(), and see if it returns a valid set of sizes - or errors out if its an invalid image file. Or use a similar function for whatever files you are trying to support.

The key is that the file name means absolutely nothing. The file extensions (.jpg, etc), the mime types... are for humans.

The only way you can guarantee that a file is of the correct type is to open it and evaluate it byte by byte. That is, obviously, a pretty daunting task if you want to try to validate a large number of file types. At the simplest level, you'd look at the first few bytes of the file to ensure that they match what is expected of a file of that type.

GrandmasterB
Do you know any manual or document for analyzing first bytes?
Tom
+1  A: 

Hey Tom,

Lots of file formats have a pretty standard set of starting bytes to indicate the format. If you do a binary read for the first several bytes and test them against the start bytes of known formats it should be a fairly reliable way to confirm the file type matches the extension.

For example, JPEG's start bytes are 0xFF, 0xD8; so something like:

$fp = fopen("filename.jpg", "rb"); $startbytes = fread($fp, 8); $chunked = str_split($startbytes,1); if ($chunked[0] == 0xFF && $chunked[1] == 0xD8){ $exts[] = "jpg"; $exts[] = "jpeg"; }

then check against the exts.

could work.

akellehe
So, correct me if I'm wrong, if JPEG's start bytes are different than `0xFF, 0xD8;` it means file is invalid right? Is there any list of "starting bytes" out there? Or...how can I create it?
Tom
Here's a decent list: http://www.mikekunz.com/image_file_header.html It's missing PNG, though, but it's header is pretty consistent from what I've seen.
Collin Allen
@Collin Allen: Thanks a lot! Now I know what to search for.
Tom