views:

203

answers:

3

Hi, I have to download some images from links. This links return me a file where is embedded a multipart mime and a tiff image. I have writed this code but it downloads the file with mime.

How I can remove the mime from this file and have the image returned? Can I do this with wget or curl?

My code: def download(url,local): import urllib urllib.urlretrieve(url,local)
urllib.urlcleanup()

Thanks a lot.

A: 

You can use the iterators offered by the email package in the Python standard library -- despite its name the package is not just about email messages, but any MIME message no matter whether you got it in email or downloaded it from somewhere.

Build a message object with msg = email.message_from_file(somefile) (after import email of course;-) where somefile is the path to the file you downloaded.

Now as the example in the URL I pointed you shows, email.iterators._structure(msg) prints out the whole structure of the message (do this just to confirm the message is saved correctly).

Then with email.iterators.typed_subpart_iterator you can easily extract all images (each as a MIME encoded message of course) and the get_payload method of the encoded message, called with decode=True, will give you the bytestring for the image payload which you can write to a binary file.

Alex Martelli
I have seen with an hex editor that the file I download contain this header:--_boundary_g9ct9yt0MIME-Version: 1.0Content-type: text/xmlContent-ID: urn:ogc:wcs:1.1:coverageshere there is a xml-document--_boundary_g9ct9yt0MIME-Version: 1.0Content-type: image/tiffContent-ID: wc_30s_CCCMA_A2a_2080_tmax_12.bilbinary_code_of_image\Uffffffff--_boundary_g9ct9yt0-But the code that I have writed dont'work.
michele
A: 

This is a link to an example of file that I have to elaborate:

original:http://www.2shared.com/photo/1C5pAlN2/original.html

to elaborate:http://www.2shared.com/photo/ZLnACSjh/wget.html One is the correct image and the next is the file downloaded from python script.

I have write this code:

try:
 f=open(file_path,"r")  
 msg = email.message_from_file(f)
except email.Errors.MessageParseError:
 print "error"
image=msg.get_payload(decode=True)
localFile = open("/var/www/tmp/14c104b2-ea86-413a-9d9e-7da703b78853/map001.tif", 'wb')
localFile.write(image)

The output file differ from the original by two points .. In the original the top of the file contain:

..--_boundary_g9ct9yt0
MIME-Version: 1.0
Content-type: text/xml
Content-ID: urn:ogc:wcs:1.1:coverages

The modified:

--_boundary_g9ct9yt0
MIME-Version: 1.0
Content-type: text/xml
Content-ID: urn:ogc:wcs:1.1:coverages

So the image is not extracted. The file that I download contain a part of xml code and a part that contain the image.

How can I extract the image?

Thanks.

michele
A: 

I have seen with an hex editor that the file I download contain this header:

--_boundary_g9ct9yt0
MIME-Version: 1.0
Content-type: text/xml
Content-ID: urn:ogc:wcs:1.1:coverages
here there is a xml-document

--_boundary_g9ct9yt0

MIME-Version: 1.0
Content-type: image/tiff
Content-ID: wc_30s_CCCMA_A2a_2080_tmax_12.bil
binary_code_of_image

\Uffffffff
--_boundary_g9ct9yt0-

But the code that I have writed dont'work.

michele