views:

612

answers:

5

I've been trying to work out the 'best practices' way to manage file uploads with Turbogears 2 and have thus far not really found any examples. I've figured out a way to actually upload the file, but I'm not sure how reliable it us.

Also, what would be a good way to get the uploaded files name?

    file = request.POST['file']
    permanent_file = open(os.path.join(asset_dirname,
        file.filename.lstrip(os.sep)), 'w')
    shutil.copyfileobj(file.file, permanent_file)
    file.file.close()
    this_file = self.request.params["file"].filename 
    permanent_file.close()

So assuming I'm understanding correctly, would something like this avoid the core 'naming' problem? id = UUID.

    file = request.POST['file']
    permanent_file = open(os.path.join(asset_dirname,
        id.lstrip(os.sep)), 'w')
    shutil.copyfileobj(file.file, permanent_file)
    file.file.close()
    this_file = file.filename
    permanent_file.close()
+1  A: 

I don't know much about Turbogears and whether it can provide anything to avoid the following, but it seems to me that this code is fraught with danger. It may be possible for a malicious user to overwrite (or create) any file that the Turbogears python process has write access to.

What if asset_dirname is /tmp, the contents of file.filename is ../../../../../../../etc/passwd and the contents of the file root::0:0:root:/root:/bin/bash? In a UNIX environment this code (permissions pending) would open the file /tmp/../../../../../../../etc/passwd in truncate mode and then copy the contents of the uploaded file to it - effectively overwriting your system's password file and specifying a root user without a password. Presumably there are nasty things that can be done to a Windows machine too.

OK, this is an extreme example that requires that python is running as root (no one does that, do they?). Even if python is running as a low-priveleged user, previously uploaded files could be overwritten at will.

To summarise, don't trust user input, in this case the user supplied filename that is available in file.filename.

mhawke
Aye, I realize that it's a risky system. I'm not really sure with how python usually handles files (Moving from a PHP background so it's a bit of a hard change, although I like python.)That's why I'm hoping someone who's designed a better Turbogears upload system would know and help out.And thanks, that's a useful first step. :)
William Chambers
I think that turbogears provides a set of useful controls called Tosca Widgets and that there is a widget for file uploads. I'd advise looking into that as you'll hopefully end up with something that standard (wrt turbogears) and which will hopefully avoid the kinds of vulnerabilities that I've pointed out in your example.
mhawke
Thank you 'very' much for that idea. It looks like Tosca Widgets does indeed have a 'file field' field. Their docs are somewhat confusing but I've found this so far.http://toscawidgets.org/documentation/tw.forms/modules/fields/basic_fields.html#filefield
William Chambers
A: 

Isn't turbogears just pylons with extras? You could check out the help there:

http://wiki.pylonshq.com/display/pylonsdocs/Form+Handling#file-uploads

However, that still contains the potential security flaw that mhawke mentioned:

os.path.join(permanent_store, myfile.filename.lstrip(os.sep))

Same as above really if filename somehow was ../../../../../etc/passwd then you could replace that file...

So you could just get the actual filename like so:

os.path.join(permanent_store, myfile.filename.split(os.sep).pop())
Ross
With `os.path.join(permanent_store,myfile.filename.split(os.sep).pop())`There's still the problem of trusting user supplied data. What if users uploaded files that had the same basename? If the basename collides, previsouly uploaded files would still be overwritten.
mhawke
So would a workable solution be to strip all forward slashes from the filename?
William Chambers
Well the pop just uses the filename and ignores the path and stops the security flaw in accessing outside your temp location.
Ross
+1  A: 

@mhawke - you're right you have to handle that - depends on what you are doing with the file, if it doesn't matter if there is a name collision eg you only care for the latest version of some data then theres probably no issue, or if the filename isn't actually important just the file contents, but its still bad practice.

You could use a named tempfile in a tmp dir, then move the file once validated to its final location. Or you could check the filename doesn't already exist like so:

file.name = slugify(myfile.filename)
name, ext = os.path.splitext(file.name)
while os.path.exists(os.path.join(permanent_store, file.name)):
    name += '_'
    file.name = name + ext

raw_file = os.path.join(permanent_store, file.name)

The slugify method would be used to tidy up the filename...

Ross
Now we're getting into race conditions. There will be an interval of time between checking whether the file already exists and actually creating it. What if 2 users upload a file of the same name at the same time? `os.path.exists()` could return `False` in both instances, then when the files are created, one file will overwrite the other.
mhawke
A: 

Werkzeug has a very good helper function for securing file names called secure_filename. I think you can adopt and use it.

Mengu
A: 

I don't know if you've seen this, but its pretty relevant. Upload Form

slestak