views:

95

answers:

3

Suppose I haev a video file:

http://mydomain.com/thevideofile.mp4

How do I get the header and the content-type of this file? With Python. But , I don't want to download the entire file. i want it to return:

video/mp4

Edit: this is what I did. What do you think?

f = urllib2.urlopen(url)
    params['mime'] =  f.headers['content-type']
+12  A: 

Like so:

>>> import httplib
>>> conn = httplib.HTTPConnection("mydomain.com")
>>> conn.request("HEAD", "/thevideofile.mp4")
>>> res = conn.getresponse()
>>> print res.getheaders()

That will only download and print the headers because it is making a HEAD request:

Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.

(via Wikipedia)

Brian McKenna
+1, yep, HEAD is indeed the way to go.
Alex Martelli
Is there anything wrong with mine? f = urllib2.urlopen(url) params['mime'] = f.headers['content-type']
TIMEX
@alex: yes, it will download the whole file.
Brian McKenna
@brian, please rephrase. It will not download the whole file.
ghostdog74
I have done some testing with ettercap. A HEAD request downloads about 400 bytes, the way alex has suggested downloads the first 80k or so of the file and leaves the connection dangling.
gnibbler
+2  A: 

This is a higher level answer than Brian's. Using the urllib machinery has the usual advantages such as handling redirects automatically and so on.

import urllib2

class HeadRequest(urllib2.Request):
    def get_method(self):
        return "HEAD"

url = "http://mydomain.com/thevideofile.mp4"
head = urllib2.urlopen(HeadRequest(url))
head.read()          # This will return empty string and closes the connection
print head.headers.maintype
print head.headers.subtype
print head.headers.type
gnibbler
A: 

you can get the video type using the info() method or the headers dict

f=urllib2.urlopen(url)
print f.headers['Content-Type']
print f.info()

A test run with an randomly selected avi file googled on the net that is more than 600Mb

$ cat test.py
#!/usr/bin/env python
import urllib2
url="http://www.merseypirates.com/rjnsteve/rjnsteve/oem16.avi"
f=urllib2.urlopen(url)
print f.headers['Content-Type']

$ time python test.py
video/x-msvideo

real    0m4.931s
user    0m0.115s
sys     0m0.042s

it will only "take up bandwidth" when the file is actually downloaded , ie packets are being sent to and from the socket.

ghostdog74
That will download the whole file.
Brian McKenna
download the whole file? as in download to local and so i have an actual physical file? No, it won't. Besides, OP is asking what's wrong with this method, so i am showing him where he is wrong.
ghostdog74
It will make a request that will **download** the whole file. Of course it won't be stored to your filesystem but the request will block and waste bandwidth for *no reason*.
Brian McKenna
No it won't. If you read the docs, urlopen return a file like object. that's why you can do things like response.read(). Its only when you read() then "bandwidth is wasted"
ghostdog74
Try it out. Download something like http://www.charlesproxy.com/ and see how much the request downloads in an active REPL. `urlopen` blocks until it gets the headers and `Content-Length` so it may *seem* instant but it's actually downloading the content in the background. When you `read`, Python blocks for the content. So **it uses up bandwidth** when you call `urlopen` - *just in the background*.
Brian McKenna
then you should have defined "download the whole file" clearly. With an avi file of 600+MB, it only takes a few seconds to get those headers.
ghostdog74