ansaurus

Question

Answer 1

+6 A:

The first thing you want to do with the uploaded content is store it in a directory which is not directly accessible for downloading. If your app exists in ~/www/ consider putting your data in '~/data/`.

The second thing, you need to determine what kind of file the user uploaded, and then create rules for each file type.

You can't trust the file based on the extension, so use something like Fileinfo. Then for each mime type, create a validator. ImageMagick can validate image files. For higher security, you may have to run a virus scanner over files like pdf's and flash files. For html, you may want to consider limit to a subset of tags.

I can't find a Python equivalent of the Fileinfo module, though it's always possible to exec /usr/bin/file -i . Most system that allow uploads then create a content name or id. They then use mod_rewrite to parse the URL, and find the content on disk. Once the content is found, it's returned to the user using sendfile, or something similar. For example, until the content is approved, maybe only the user who uploaded it is allowed to view it.

brianegge 2009-11-17 00:32:38

Thanks for your reply. I'm using Django, so preferably I want Python equivalents of Fileinfo and ImageMagick. Good advice about storing content in a private directory until I'm satisfied it's safe.

Alasdair 2009-11-17 00:57:51

Answer 2

+8 A:

For images you might be able to just use Python Imaging Library (PIL).

Image.open(filepath)

If the file is not an image, an exception will be thrown. I'm pretty new to Python/Django so someone else might have a better way of validating images.

Matt McCormick 2009-11-17 18:25:32

django.forms includes an ImageField that automatically does the validation for you.

Alex Gaynor 2009-11-24 08:46:53

Answer 3

+2 A:

you can validate html files with BeautifulSoup

barbuza 2009-11-24 13:03:05

I'm aware of Beautiful soup, but the following answer on stack overflow suggests that it won't catch all XSS attacks - http://stackoverflow.com/questions/699468/python-html-sanitizer-scrubber-filter/812785#812785

Alasdair 2009-11-24 17:13:06

if your want to catch XSS - you can try Genshi sanitizer on html produced by BeautifulSoup

barbuza 2009-11-25 16:01:08

a valid upload might not be valid html

Carson 2009-11-25 19:44:03

Answer 4

+1 A:

'trusted users' is a subjective term. Is it people that you know in person or only someone who has created an account on your app? Don't give access to your filesystem to people that you don't know in person.

Giving the ability to someone to upload a file is in any case a bit dangerous and I think that it should be avoided. I was facing a similar problem last week with the automatic upload of html code and I've decided to store it in the database. I think that in most cases, you can use the database rather than the file system.

One problem with the validation is that you'll have to write a new validator for any type of files. It can be a limitation in the future and be a big task in some cases.

So, I would recommend to reconsider a database-based design.

luc 2009-11-24 13:27:36

I know the 'trusted users' and trust them not to upload malicious content. The danger is whether I can trust them to keep their passwords from falling into the wrong hands.

Alasdair 2009-11-24 17:02:09

password stealing is another problem. django salted hash should help but we never know. That's another argument for not giving access to your filesystem.

luc 2009-11-26 15:01:24

Answer 5

+2 A:

This is a little bit specific to your hosting environment, but here is what I do:

Serve all user uploaded content with Nginx instead of apache, and serve it all as static content (it will not run any of the php or cgi, even if the users upload it)

Jiaaro 2009-11-24 15:46:46

Thanks for your reply. I'd be interested to know how I can achieve a similar result with lighttpd.

Alasdair 2009-11-24 17:16:54

most web servers are capable of serving static files. See lighttpd docs on how to setup a directory for that.

Marcus Lindblom 2009-11-25 16:30:36

Answer 6

+5 A:

All the answers are focusing on validating files. This is pretty much impossible.

The Django devs aren't asking you to validate whether files can be executed as cgi files. They are just telling you not to put them in a place where they will be executed.

You should put all Django stuff in a specially Django directory. That Django code directory should not contain static content. Don't put user files in the Django source repository.

If you are using Apache2, check out the basic cgi tutorial: http://httpd.apache.org/docs/2.0/howto/cgi.html

Apache2 might be setup to run any files in the ScriptAlias folder. Don't put user files in the /cgi-bin/ or /usr/local/apache2/cgi-bin/ folders.

Apache2 might be set to server cgi files, depending on the AddHandler cgi-script settings. Don't let the users submit files with extensions like .cgi or .pl.

However, you do need to sanitize user submitted files so they are safe to run on other clients' machines. Submitted HTML is unsafe to other users. It won't hurt your server. Your server will just spit it back at whoever requests it. Get a HTML sanitizer.

Also, SVG may be unsafe. It's had bugs in the past. SVG is an XML document with javascript in it, so it can be malicious.

PDF is ... tricky. You could convert it to an image (if you really had to), or provide an image preview (and let users download at their own risk), but it would be a pain for people trying to use it.

Consider a white-list of files that are OK. A virus embedded in a gif, jpeg or png file will just look like a corrupt picture (or fail to display). If you want to be paranoid, convert them all to a standard format using PIL (hey, you could also check sizes). Sanitized HTML should be OK (stripping out script tags isn't rocket science). If the sanitization is sucking cycles (or you're just cautious), you could put it on a separate server, I guess.

wisty 2009-11-25 11:20:19

ansaurus

tags:

views:

answers:

Validating Uploaded Files in Django

related questions