views:

72

answers:

2

So I have a recordset (sqlalchemy) of products that I am looping, and I want to download an image and save it to a folder.

If the folder doesn't exist, I want to create it.

Also, I want to first check if the image file exists in the folder. If it does, don't download just skip that row.

/myscript.py
/images/

I want the images folder to be a folder in the same directory as my script file, wherever it may be stored.

I have so far:

q = session.query(products)

for p in q:
     if p.url:
          req = urllib2.Request(p.url)
          try:
                 response = urllib2.urlopen(req)
                 image = response.read()

                 ???
          except URLError e:
                 print e
+2  A: 

I think you can just use urllib.urlretrieve here:

import errno
import os
import urllib

def require_dir(path):
    try:
        os.makedirs(path)
    except OSError, exc:
        if exc.errno != errno.EEXIST:
            raise

directory = os.path.join(os.path.dirname(os.path.abspath(__file__)), "images")
require_dir(directory)
filename = os.path.join(directory, "stackoverflow.html")

if not os.path.exists(filename):
    urllib.urlretrieve("http://stackoverflow.com", filename)
Philipp
should the path be /images/ or a full path from the root?
Blankman
My example will use a directory called `images` that resides in the same directory as the script file.
Philipp
+1  A: 

The filename should be in response.info()['Content-Disposition'] (as a filename=something after a semicolon in that string) -- if not (that header is missing, has no semicolon, or has no filename part), you can use urlparse.urlsplit(p.url) and get the os.path.basename of the last non-blank component (or, more pragmatically but that would deeply offend purists, just p.url.split('/')[-1] ;-).

So much for the filename, call it e.g. fn.

The directory where your script lives is sd = os.path.dirname(__file__).

Its images subdirectory is therefore clearly sdsd = os.path.join(sd, 'images').

To check if that subdirectory exists, and make it otherwise,

if not os.path.exists(sdsd): os.makedir(sdsd)

To check if the file you want to write already exists,

if os.path.exists(os.path.join(sdsd, fn)): ...

All of this code goes where you have ???. It's a lot, so it's clearly better to make it a function taking p.url and response as arguments (it can read image on its own;-) and possibly taking __file__ as well if you want the freedom to move that function into its own separate module later (I'd recommend that!).

Of course, you need to import os for all those os and os.path calls, and also import urlparse if you decide to use the latter standard library module.

Alex Martelli