views:

4097

answers:

2

Problem: When POSTing data with Python's urllib2, all data is URL encoded and sent as Content-Type: application/x-www-form-urlencoded. When uploading files, the Content-Type should instead be set to multipart/form-data and the contents be MIME encoded. A discussion of this problem is here: http://code.activestate.com/recipes/146306/

To get around this limitation some sharp coders created a library called MultipartPostHandler which creates an OpenerDirector you can use with urllib2 to mostly automatically POST with multipart/form-data. A copy of this library is here: http://peerit.blogspot.com/2007/07/multipartposthandler-doesnt-work-for.html

I am new to Python and am unable to get this library to work. I wrote out essentially the following code. When I capture it in a local HTTP proxy, I can see that the data is still URL encoded and not multi-part MIME encoded. Please help me figure out what I am doing wrong or a better way to get this done. Thanks :-)

FROM_ADDR = '[email protected]'

try:
    data = open(file, 'rb').read()
except:
    print "Error: could not open file %s for reading" % file
    print "Check permissions on the file or folder it resides in"
    sys.exit(1)

# Build the POST request
url = "http://somedomain.com/?action=analyze"    
post_data = {}
post_data['analysisType'] = 'file'
post_data['executable'] = data
post_data['notification'] = 'email'
post_data['email'] = FROM_ADDR

# MIME encode the POST payload
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
urllib2.install_opener(opener)
request = urllib2.Request(url, post_data)
request.set_proxy('127.0.0.1:8080', 'http') # For testing with Burp Proxy

# Make the request and capture the response
try:
    response = urllib2.urlopen(request)
    print response.geturl()
except urllib2.URLError, e:
    print "File upload failed..."

EDIT1: Thanks for your response. I'm aware of the ActiveState httplib solution to this (I linked to it above). I'd rather abstract away the problem and use a minimal amount of code to continue using urllib2 how I have been. Any idea why the opener isn't being installed and used?

+3  A: 

Found this recipe to post multipart using httplib directly (no external libraries involved)

import httplib
import mimetypes

def post_multipart(host, selector, fields, files):
    content_type, body = encode_multipart_formdata(fields, files)
    h = httplib.HTTP(host)
    h.putrequest('POST', selector)
    h.putheader('content-type', content_type)
    h.putheader('content-length', str(len(body)))
    h.endheaders()
    h.send(body)
    errcode, errmsg, headers = h.getreply()
    return h.file.read()

def encode_multipart_formdata(fields, files):
    LIMIT = '----------lImIt_of_THE_fIle_eW_$'
    CRLF = '\r\n'
    L = []
    for (key, value) in fields:
        L.append('--' + LIMIT)
        L.append('Content-Disposition: form-data; name="%s"' % key)
        L.append('')
        L.append(value)
    for (key, filename, value) in files:
        L.append('--' + LIMIT)
        L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, filename))
        L.append('Content-Type: %s' % get_content_type(filename))
        L.append('')
        L.append(value)
    L.append('--' + LIMIT + '--')
    L.append('')
    body = CRLF.join(L)
    content_type = 'multipart/form-data; boundary=%s' % BOUNDARY
    return content_type, body

def get_content_type(filename):
    return mimetypes.guess_type(filename)[0] or 'application/octet-stream'
nosklo
This seems right, but the poster module is more right.
bentford
+14  A: 

It seems that the easiest and most compatible way to get around this problem is to use the 'poster' module.

# test_client.py
from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib2

# Register the streaming http handlers with urllib2
register_openers()

# Start the multipart/form-data encoding of the file "DSC0001.jpg"
# "image1" is the name of the parameter, which is normally set
# via the "name" parameter of the HTML <input> tag.

# headers contains the necessary Content-Type and Content-Length
# datagen is a generator object that yields the encoded parameters
datagen, headers = multipart_encode({"image1": open("DSC0001.jpg")})

# Create the Request object
request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers)
# Actually do the request, and get the response
print urllib2.urlopen(request).read()

This worked perfect and I didn't have to muck with httplib. The module is available here: http://atlee.ca/software/poster/index.html

Dan
This is exactly what I needed! Kudos.
Andrew Keeton