views:

1106

answers:

6

I am working on an ASP web page that handles file uploads. Only certain types of files are allowed to be uploaded, like .XLS, .XML, .CSV, .TXT, .PDF, .PPT, etc.

I have to decide if a file really has the same type as the extension shows. In other words if a trojan.exe was renamed to harmless.pdf and uploaded, the application must be able to find out that the uploaded file is NOT a .PDF file.

What techniques would you use to analyze these uploaded files? Where can I get the best information about the format of these files?

+3  A: 

Get the file headers of the "safe" file types - executables always have their own types of headers, and you can probably detect those. You'd have to be familiar with every format that you intend to accept, however.

Andrei Krotkov
+4  A: 

One way would be to check for certain signatures or magic numbers in the files. This page has a handy list of known file signatures and seems quite up to date:

http://www.garykessler.net/library/file_sigs.html

Kev
+2  A: 

I know you said C#, but this could maybe be ported. Also, it has an XML file already containing many descriptors for common file types.

It's a Java library called JMimeMagic. It's here: http://jmimemagic.sourceforge.net/

sjbotha
+1  A: 

Maybe you could approach this from a different direction. Instead of identifying all the file types that are uploaded (Excel alone seems like a mess to me, because it has several formats these days), why not run all the uploads through a virus scanner? A wide variety of files can contain viruses and trojans. It may be more work for your server, but it's the safest solution.

Then it's up to the users to correctly identify their file types, which seems reasonable. Adding in a lot of code (that will need to be tested too) just to double check your users seems like a big step. If I say it's a .pdf2 file will you rename it to .pdf? If this is in a corporate environment then it's reasonable to expect the users to have correct extensions on their files. I'd track who uploaded what as well. If it's public then scanning for file types might be worthwhile, but I'd absolutely do the virus scan as well.

jcollum
+2  A: 

In other words if a trojan.exe was renamed to harmless.pdf and uploaded, the application must be able to find out that the uploaded file is NOT a .PDF file.

That's not really a problem. If a .exe was uploaded as a .pdf and you correctly served it back up to the downloader as application/pdf, all the downloader would get would be a broken PDF. They would have to manually retype it to .exe to get harmed.

The real problems are:

  1. Some browsers may sniff the content of the file and decide they know better than you about what type of file it is. IE is particularly bad at this, tending to prefer to render the file as HTML if it sees any HTML tags lurking near the start of the file. This is particulary unhelpful as it means script can be injected onto your site, potentially compromising any application-level security (cookie stealing et al). Workarounds include always serving the file as an attachment using Content-Disposition, and/or serving files from a different hostname, so it can't cross-site-script back onto your main site.

  2. PDF files are not safe anyway! They can be full of scripting, and have had significant security holes. Exploitation of a hole in the PDF reader browser plugin is currently one of the most common means of installing trojans on the web. And there's almost nothing you can usually do to try to detect the exploits as they can be highly obfuscated.

bobince
+1  A: 

On *NIX systems we have an utility called file(1). Try to find something similar for Windows, but the file utility if self has been ported.

daniel