views:

353

answers:

8

My boss has come to me and asked how to enure a file uploaded through web page is safe. He wants people to be able to upload pdfs and tiff images (and the like) and his real concern is someone embedding a virus in a pdf that is then viewed/altered (and the virus executed). I just read something on a procedure that could be used to destroy stenographic information emebedded in images by altering least sifnificant bits. Could a similar process be used to enusre that a virus isn't implanted? Does anyone know of any programs that can scrub files?

Update: So the team argued about this a little bit, and one developer found a post about letting the file download to the file system and having the antivirus software that protects the network check the files there. The poster essentially said that it was too difficult to use the API or the command line for a couple of products. This seems a little kludgy to me, because we are planning on storing the files in the db, but I haven't had to scan files for viruses before. Does anyone have any thoughts or expierence with this?

http://www.softwarebyrob.com/2008/05/15/virus-scanning-from-code/

+4  A: 

I'd recommend running your uploaded files through antivirus software such as ClamAV. I don't know about scrubbing files to remove viruses, but this will at least allow you to detect and delete infected files before you view them.

pix0r
+1  A: 

You'd probably need to chain an actual virus scanner to the upload process (the same way many virus scanners ensure that a file you download in your browser is safe).

In order to do this yourself, you'd have to keep it up to date, which means keeping libraries of virus definitions around, which is likely beyond the scope of your application (and may not even be feasible depending on the size of your organization).

DannySmurf
A: 

With a antivirus software like clamAv, does it scan the internals of a byte file like pdf? I ask because the place I previously worked had really strict requirements of not sending Exes through email, and our way of bypassing it was to change the extension to .pdf. The email scanner ignored it, because it new it was supposed to be a byte (non-text) file.

Kevin
A: 

Yes, ClamAV should scan the file regardless of the extension.

pix0r
+1  A: 

It depends on your company's budget but there are hardware devices and software applications that can sit between your web server and the outside world to perform these functions. Some of these are hardware firewalls with anti-virus software built in. Sometimes they are called application gateways or application proxies.

Here are links to an open source gateway that uses Clam-AV: http://en.wikipedia.org/wiki/Gateway_Anti-Virus http://gatewayav.sourceforge.net/faq.html

Mark
+1  A: 

Viruses embedded in image files are unlikely to be a major problem for your application. What will be a problem is JAR files. Image files with JAR trailers can be loaded from any page on the Internet as a Java applet, with same-origin bindings (cookies) pointing into your application and your server.

The best way to handle image uploads is to crop, scale, and transform them into a different image format. Images should have different sizes, hashes, and checksums before and after transformation. For instance, Gravatar, which provides the "buddy icons" for Stack Overflow, forces you to crop your image, and then translates it to a PNG.

Is it possible to construct a malicious PDF or DOC file that will exploit vulnerabilities in Word or Acrobat? Probably. But ClamAV is not going to do a very good job at stopping those attacks; those aren't "viruses", but rather vulnerabilities in viewer software.

tqbf
A: 

Use a reverse proxy setup such as

www <-> HAVP <-> webserver

HAVP (http://www.server-side.de/) is a way to scan http traffic though ClamAV or any other commercial antivirus software. It will prevent users to download infected files. If you need https or anything else, then you can put another reverse proxy or web server in reverse proxy mode that can handle the SSL before HAVP

Nevertheless, it does not work at upload, so it will not prevent the files to be stored on servers, but prevent the files from being downloaded and thus propagated. So use it with a regular file scanning (eg clamscan).

A: 

Don't store files in your database, it will bloat making a sensible backup regime unnecessarily hard to maintain. Use your filesystem. Keep strings containing filepaths in your database.

Peter Wone