ansaurus

Question

How do I get a content-type of a file in Python? (with url..)

Answer 1

+12 A:

Like so:

>>> import httplib
>>> conn = httplib.HTTPConnection("mydomain.com")
>>> conn.request("HEAD", "/thevideofile.mp4")
>>> res = conn.getresponse()
>>> print res.getheaders()

That will only download and print the headers because it is making a HEAD request:

Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.

(via Wikipedia)

Brian McKenna 2010-01-27 00:16:56

+1, yep, HEAD is indeed the way to go.

Alex Martelli 2010-01-27 00:21:22

Is there anything wrong with mine? f = urllib2.urlopen(url) params['mime'] = f.headers['content-type']

TIMEX 2010-01-27 00:29:20

@alex: yes, it will download the whole file.

Brian McKenna 2010-01-27 01:02:51

@brian, please rephrase. It will not download the whole file.

ghostdog74 2010-01-27 04:43:33

I have done some testing with ettercap. A HEAD request downloads about 400 bytes, the way alex has suggested downloads the first 80k or so of the file and leaves the connection dangling.

gnibbler 2010-01-27 05:35:41

Answer 2

+2 A:

This is a higher level answer than Brian's. Using the urllib machinery has the usual advantages such as handling redirects automatically and so on.

import urllib2

class HeadRequest(urllib2.Request):
    def get_method(self):
        return "HEAD"

url = "http://mydomain.com/thevideofile.mp4"
head = urllib2.urlopen(HeadRequest(url))
head.read()          # This will return empty string and closes the connection
print head.headers.maintype
print head.headers.subtype
print head.headers.type

gnibbler 2010-01-27 00:22:10

Answer 3

A:

you can get the video type using the info() method or the headers dict

f=urllib2.urlopen(url)
print f.headers['Content-Type']
print f.info()

A test run with an randomly selected avi file googled on the net that is more than 600Mb

$ cat test.py
#!/usr/bin/env python
import urllib2
url="http://www.merseypirates.com/rjnsteve/rjnsteve/oem16.avi"
f=urllib2.urlopen(url)
print f.headers['Content-Type']

$ time python test.py
video/x-msvideo

real    0m4.931s
user    0m0.115s
sys     0m0.042s

it will only "take up bandwidth" when the file is actually downloaded , ie packets are being sent to and from the socket.

ghostdog74 2010-01-27 00:56:03

That will download the whole file.

Brian McKenna 2010-01-27 00:59:35

download the whole file? as in download to local and so i have an actual physical file? No, it won't. Besides, OP is asking what's wrong with this method, so i am showing him where he is wrong.

ghostdog74 2010-01-27 01:25:56

It will make a request that will **download** the whole file. Of course it won't be stored to your filesystem but the request will block and waste bandwidth for *no reason*.

Brian McKenna 2010-01-27 01:46:34

No it won't. If you read the docs, urlopen return a file like object. that's why you can do things like response.read(). Its only when you read() then "bandwidth is wasted"

ghostdog74 2010-01-27 02:06:17

Try it out. Download something like http://www.charlesproxy.com/ and see how much the request downloads in an active REPL. `urlopen` blocks until it gets the headers and `Content-Length` so it may *seem* instant but it's actually downloading the content in the background. When you `read`, Python blocks for the content. So **it uses up bandwidth** when you call `urlopen` - *just in the background*.

Brian McKenna 2010-01-27 02:44:35

then you should have defined "download the whole file" clearly. With an avi file of 600+MB, it only takes a few seconds to get those headers.

ghostdog74 2010-01-27 03:00:00

ansaurus

tags:

views:

answers:

How do I get a content-type of a file in Python? (with url..)

related questions