views:

579

answers:

6

How do you go about verifying the type of an uploaded file reliably without using the extension? I'm guessing that you have to examine the header / read some of the bytes, but I really have no idea how to go about it. Im using c# and asp.net.

Thanks for any advice.

+1  A: 

The first few bytes of a file will often tell you the file type. See, for example,
http://www.garykessler.net/library/file_sigs.html
http://www.astro.keele.ac.uk/oldusers/rno/Computing/File_magic.html

Use System.IO to read the byes as binary after the upload.

I'm curious, though, why you can't rely on on the ContentType header?

Corey Trager
because I'm using a component that streams everything as binary!
flesh
The first link is much more comprehensive than the second.
Jonathan Leffler
You can't rely on Content-Type because the client may be hostile and deliberately claim an incorrect Content-Type as part of an attack. Trusting Content-Type is little, if any, better than trusting file extensions.
Dave Sherohman
How is trusting the content type, or file extension, any worse than trusting the header (first x bytes) of a file, which could also be forged.
Kibbee
+2  A: 

That indeed is what the Unix file program does, with greater or lesser degrees of reliability. In part, it depends on whether the programs whose files you are trying to detect emits a file header; the program tar is notorious for not doing so. It depends on how many types of files you plan to try and recognize, but it might well be simplest to use an implementation of file; it recognizes many file types, and modern versions are extensible via a file of extra file type definitions that can handle a multitude of scenarios.

Jonathan Leffler
A: 

Wotsit is a good resource for finding out the magic numbers for various file types.

Lou Franco
A: 

ok, so from the above links I now know that I am looking for 'ff d8 ff e0' to positively identify a .jpg file for example.

In my code I can read the first twenty bytes no problem:

                FileStream fs = File.Open(filePath, FileMode.Open);
                Byte[] b = new byte[20];
                fs.Read(b, 0, 20);

so (and please excuse my total inexperience here) but how do I check whether the byte array contains 'ff d8 ff e0'?

flesh
+2  A: 

Here's a quick-and-dirty response to the followup question you posted:

byte[] jpg = new byte[] { 0xFF, 0xD8, 0xFF, 0xE0 };
bool match = true;
for (int i = 0; i < jpg.Length; i++)
{
    if (jpg[i] != b[i])
    {
        match = false;
        break;
    }
}
Jon B
A: 

Reading the contents of the file is the fool proof way. Since you are building it in .Net, you could probably check the MIME Type of the uploaded file.

You can DllImport urlmon.dll to help. Please refer a post at: http://coding-passion.blogspot.com/2008/11/validating-file-type.html

And to clarify regarding Content-type, it invariably is driven by the extension of the file. So even a .zip file got its extension renamed to .txt, the content type will still say Text only.