views:

2273

answers:

10

Hey everyone,

I am working on a script that will process user uploads to the server, and as an added layer of security I'd like to know:

Is there a way to detect a file's true extension/file type, and ensure that it is not another file type masked with a different extension?

Is there a byte stamp or some unique identifier for each type/extension?

I'd like to be able to detect that someone hasn't applied a different extension onto the file they are uploading.

Thank you,

A: 

In *nix, the first two bytes of the file tells you (see "magic number"). In Windows, ...sometimes this will be true ("header info"). It is, ultimately, O.S. dependent.

gbarry
A: 

Executables in general have a "signature" on the first bytes; I find it hard though to really ascertain what the file type really is.

Otávio Décio
+1  A: 

that could still be forged. I would ensure that you can not (or do not) run a file uploaded to the server automatically.

I would also have a virus/spy ware scanner, and let it do the work for you.

cbrulak
+3  A: 

PHP has a couple of ways of reading file contents to determine its MIME type, depending on which version of PHP you are using:

Have a look at the Fileinfo functions if you're running PHP 5.3+

$finfo = finfo_open(FILEINFO_MIME); 
$type = finfo_file($finfo, $filepath);
finfo_close($finfo);

Alternatively, check out mime_content_type for older versions.

$type = mime_content_type($filepath);

Note that just validating the file type isn't enough if you want to be truly secure. Someone could, for example, upload a valid JPEG file which exploits a vulnerability in a common renderer. To guard against this, you would need a well maintained virus scanner.

Paul Dixon
A: 

What file types do you expect? Maybe you could check that it conforms to what you expect and reject everything else.

armandino
I am expecting MP3 only, and by the sounds of it from Karl's comment, there is a unique identifier.
barfoon
+6  A: 

Not really, no.

You will need to read the first few bytes of each file and interpret it as a header for a finite set of known filetypes. Most files have distinct file headers, some sort of metadata in the first few bytes or first few kilobytes in the case of MP3.

Your program will have to simply try parsing the file for each of your accepted filetypes.

For my program, I send the uploaded image to imagemagick in a try-catch block, and if it blows up, then I guess it was a bad image. This should be considered insecure, because I am loading arbitrary (user supplied) binary data into an external program, which is generally an attack vector. here, I am trusting imageMagick to not do anything to my system.

I recommend writing your own handlers for the significant filetypes you intend to use, to avoid any attack vectors.

Edit: I see in PHP there are some tools to do this for you.

Also, MIME types are what the user's browser claims the file to be. It is handy and useful to read those and act on them in your code, but it is not a secure method, because anyone sending you bad files will fake the MIME headers easily. It's sort of a front line defense to keep your code that expects a JPEG from barfing on a PNG, but if someone embedded a virus in a .exe and named it JPEG, there's no reason not to have spoofed the MIME type.

Karl
Is there a way to tell if my header I determine is correct, and more importantly universal for EVERY file of that type?
barfoon
You can only look up the canonical forms from some textbook or wiki, for instance, JPEG is a standard that has a standard set of headers, and BMP is a different one. "correct" is open to your application's interpretation.
Karl
+2  A: 

PHP has a superglobal $_FILES that holds information like size and file type. It looks like the type is taken form some sort of a header, not an extension, but I may be wrong.

There is an example of it on w3schools site.

I am going to test if it is can be tricked when I get a chance.

UPDATE:

Everyone else probably knew this, but $_FILES can be tricked. I was able to determine it this way:

$arg = escapeshellarg( $_FILES["file"]["tmp_name"] );
system( "file $arg", $type );
echo "Real type:  " . $type;

It basically uses Unix's file command. There are probably better ways, but I haven't used PHP in a while. I usually avoid using system commands if possible.

gpojd
A: 

Others have already mentioned FileInfo, which I think is the correct solution, but I'll add this just in case you can't use that one for some reason. Most (all?) *nix distros include a command called file that when run on a file will output its type. It has a switch to output in human readable format (default) or the MIME type. You could have your script invoke this program on the uploaded file and read the result. Again, this is not the preferred approach. If you're on Windows, this utility is available through Cygwin.

rmeador
A: 

Is checking the MIME type simply enough? I am assuming that changing the extension on a file doesn't change it's MIME type?

Is MIME type a strong enough indicator to go by here?

Thanks for all of the responses thus far.

barfoon
The mime type is supplied by the browser and is not verified by PHP. If you use that to determine the file type, as suggested by some, you'll have the same problem of ensuring that it's not another file type masked with a different mime type. Karl's or Paul's answer is the best so far.
bmb
i wouldn't even trust that. What you really want is a good policy on this. don't trust uploaded files (so don't run them automatically) and scan them.
cbrulak
The functions I outlined look at the file contents, not its name, so you can have a high degree of confidence that the mime type returned by these functions is correct. Without a virus scanner, you can't be entirely sure the file is benign though (e.g. some unknown JPEG exploit for example)
Paul Dixon
A: 

Is checking the MIME type simply enough? I am assuming that changing the extension on a file doesn't change it's MIME type? Is MIME type a strong enough indicator to go by here?

It really depends on how it's used.

  • If you provide uploads and downloads, then nothing matters since it doesn't execute.
  • If it's handled by the web server, then it's going to be dependent on how the web server is configured, though subject to most of the rest of these comments.
  • If it's an image, it will either display, or not, or be the target of image library exploits. But only those.
  • Something like a pdf file may not affect your server, but rather the computer of the person accessing the file.
  • If it's going to be passed to a function like "system()" then we're back to the OS behavior--as if it were "double-clicked", and the file extension might even be considered.
gbarry