ansaurus

Question

Download image file from the HTML page source using python?

Answer 1

+5 A:

You have to download the page and parse html document, find your image with regex and download it.. You can use urllib2 for downloading and Beautiful Soup for parsing html file.

2008-11-02 21:33:53

Answer 2

+1 A:

Use htmllib to extract all img tags (override do_img), then use urllib2 to download all the images.

Martin v. Löwis 2008-11-02 21:34:28

This assumes non-broken html, which Beautiful Soup can cope with.

Ali A 2008-11-02 21:51:28

On the other hand, this is using only standard library modules.

ΤΖΩΤΖΙΟΥ 2008-11-02 22:57:44

Answer 3

+12 A:

Here is some code to download all the images from the supplied URL, and save them in the specified output folder. You can modify it to your own needs.

"""
dumpimages.py
    Downloads all the images on the supplied URL, and saves them to the
    specified output file ("/test/" by default)

Usage:
    python dumpimages.py http://example.com/ [output]
"""

from BeautifulSoup import BeautifulSoup as bs
import urlparse
from urllib2 import urlopen
from urllib import urlretrieve
import os
import sys

def main(url, out_folder="/test/"):
    """Downloads all the images at 'url' to /test/"""
    soup = bs(urlopen(url))
    parsed = list(urlparse.urlparse(url))

    for image in soup.findAll("img"):
        print "Image: %(src)s" % image
        filename = image["src"].split("/")[-1]
        parsed[2] = image["src"]
        outpath = os.path.join(out_folder, filename)
        urlretrieve(urlparse.urlunparse(parsed), outpath)

def _usage():
    print "usage: python dumpimages.py http://example.com [outpath]"

if __name__ == "__main__":
    url = sys.argv[-1]
    out_folder = "/test/"
    if not url.lower().startswith("http"):
        out_folder = sys.argv[-1]
        url = sys.argv[-2]
        if not url.lower().startswith("http"):
            _usage()
            sys.exit(-1)
    main(url, out_folder)

Edit: You can specify the output folder now.

Ryan Ginstrom 2008-11-03 12:40:27

`open(..).write(urlopen(..)` could be replaced by `urllib.urlretrieve()`

J.F. Sebastian 2008-11-03 12:48:53

Thanks for pointing that out. Edited code to reflect.

Ryan Ginstrom 2008-11-03 22:23:06

Answer 4

A:

And this is function for download one image:

def download_photo(self, img_url, filename):
    file_path = "%s%s" % (DOWNLOADED_IMAGE_PATH, filename)
    downloaded_image = file(file_path, "wb")

    image_on_web = urllib.urlopen(img_url)
    while True:
        buf = image_on_web.read(65536)
        if len(buf) == 0:
            break
        downloaded_image.write(buf)
    downloaded_image.close()
    image_on_web.close()

    return file_path

Dingo 2010-03-15 15:35:20

ansaurus

tags:

views:

answers:

Download image file from the HTML page source using python?

related questions