ansaurus

Question

Answer 1

+1 A:

Use regular expressions.

import re

r_url = re.compile(r"^https?:")
r_image = re.compile(r".(jpg|png|gif)$")

for dictionaries in d_dict:
  type  = dictionaries.get('type')
  if r_url.match(type):
    logging.debug("type is url")
  else if r_image.match(type)
    logging.debug("type is image")
  else:
     logging.debug("invalid type")

Two remarks: type is a builtin, and images could be loaded from an URL too.

leoluk 2010-09-13 16:29:27

*now he has two problems*

SilentGhost 2010-09-13 16:31:13

@silent ghost: i will deal with it..

Rajeev 2010-09-13 16:32:53

@Rajeev: you sure will, and when it won't work you'll ask for help on SO.

SilentGhost 2010-09-13 16:35:21

@silent ghost:thanks but again i will deal with it.I just needed a clue to start of with.

Rajeev 2010-09-13 16:36:50

@rajeev: and the code you had is not enough? Do you know enough of python to write what you have in syntactically-correct Python?

SilentGhost 2010-09-13 16:39:53

@silent ghost:No offense,but i thought the code was good enough for my requirements.please let me know.if anything wrong in the above .I would surely appreciate it.

Rajeev 2010-09-13 16:42:01

This doesn't work. `.match` looks at the front of the string, and `.gif` will never match there. Also the `.` should be escaped. I've no idea why you'd go for regex here, when Python offers perfectly good `string.startswith(x)` and `.endswith(y)` methods.

bobince 2010-09-13 17:12:56

Answer 2

+3 A:

You cannot tell what type a resource is purely from its URL. It is perfectly valid to have an GIF file at a URL without a .gif file extension, or with a misleading file extension like .txt. In fact it is quite likely, now that URL-rewriting is popular, that you'll get image URLs with no file extension at all.

It is the Content-Type HTTP response header that governs what type a resource on the web is, so the only way you can find out for sure is to fetch the resource and see what response you get. You can do this by looking at the headers returned by urllib.urlopen(url).headers, but that actually fetches the file itself. For efficiency you may prefer to make HEAD request that doesn't transfer the whole file:

import urllib2
class HeadRequest(urllib2.Request):
    def get_method(self):
        return 'HEAD'

response= urllib2.urlopen(HeadRequest(url))
maintype= response.headers['Content-Type'].split(';')[0].lower()
if maintype not in ('image/png', 'image/jpeg', 'image/gif'):
    logging.debug('invalid type')

If you must try to sniff type based on the file extension in a URL path part (eg because you don't have a net connection), you should parse the URL with urlparse first to remove any ?query or #fragment part, so that http://www.example.com/image.png?blah=blah&foo=.txt doesn't confuse it. Also you should consider using mimetypes to map the filename to a Content-Type, so you can take advantage of its knowledge of file extensions:

import urlparse, mimetypes

maintype= mimetypes.guess_type(urlparse.urlparse(url).path)[0]
if maintype not in ('image/png', 'image/jpeg', 'image/gif'):
    logging.debug('invalid type')

(eg. so that alternative extensions are also allowed. You should at the very least allow .jpeg for image/jpeg files, as well as the mutant three-letter Windows variant .jpg.)

bobince 2010-09-13 17:08:00

Answer 3

A:

If you are going to guess the type of a resource from its URL, then I suggest you use the mimetypes library. Realize, however, that you can really only make an educated guess this way.

As bobince suggests, you could also make a HEAD request and use the Content-Type header. This, however, assumes that the server is configured (or, in the case of a web application, programmed) to return the correct Content-Type. It might not be.

So, the only way to really tell is to download the file and use something like libmagic (although it is conceivable even that could fail). If you decide this degree of accuracy is necessary, you might be interested in this python binding for libmagic.

Nathan Davis 2010-09-14 05:52:23

ansaurus

tags:

views:

answers:

Python test for a url and image type

related questions