How can I programmatically scrape an image from another website?

A:

If you use php at your project you can use CURL library to get another website content and using regex parse it for getting image url from source code.

antyrat 2010-03-04 14:19:45

I wouldn't use regex to parse HTML.

Oren 2010-03-04 14:23:23

Can we stop with that knee-jerk response? I wouldn't use regular expressions to parse any possible arbitrary HTML, but why wouldn't you use it to parse an expected HTML string?

Tom 2010-03-04 15:32:58

Because it's 'expected' and sometimes, our expectations let us down.

belugabob 2010-03-04 15:49:53

@belugabob If the HTML is not "as expected" then you are going to fail to extract the URL, whatever method you use. That's not an argument against regexes

MarkJ 2010-03-04 17:46:58

+1 A:

Its probably a big fat violation of copyright.

The picture is most like containered within a page - just regularly visit that page and parse the img tag. Make sure that the random bit you commented on is not just a random parameter to force browsers to fetch the fresh image instead of retrieving a cached version.

kime waza 2010-03-04 14:19:56

Yes, that's why I noted "personal use only", not served out to the world. It's a time-saver to look at several webpages at the same time.

Kristo 2010-03-04 14:28:56

it's not a copyright violation if the image is offered to the world, i.e. placed on a publicly-accessible website. There are enough ways to deny access. Some are even considered effective enough that bypassing them would violate the DMCA, but for the basic copyright claim that's not even needed.

MSalters 2010-03-04 15:26:48

Being on a publicly available website does not mean that the images cannot be copyrighted. The phrase 'copyright' means just that 'The right to copy' and is not - nor should be expected to - superceded by the image being publicly visible. If you took a photograph of a painting that was copyrighted, but temporarily displayed in a public place, you wouldn't have any right to sell copies of that photograph - the same applies to the web.

belugabob 2010-03-04 15:48:48

A:

Fetch html of remote page using Cross Domain AJAX.
Then parse it to get urls of images of interest.
Then for each url do <img src=url />

TheMachineCharmer 2010-03-04 15:21:23

Why the downvote?

TheMachineCharmer 2010-03-04 15:25:12

I'm guessing it's because the question is about photos on various other web pages, not the one he has control over.

Tom 2010-03-04 15:33:59

This will also get every other image on the page, which could be LOTS of images - your problem has then changed to 'Out of a load of URLs, how do I find the ones that I'm interested in?'

belugabob 2010-03-04 15:51:50

:) Yeah got it. Answer changed friends. Give it a try. Two more downvotes and I ll delete it. :)

TheMachineCharmer 2010-03-04 17:55:38

A:

You have a Python question in your profile, so I'll just say if I were trying to do this, I'd go with Python & Beautiful Soup. Has the added advantage of being able to handle invalid HTML.

Tom 2010-03-04 15:38:17

ansaurus

tags:

views:

answers:

How can I programmatically scrape an image from another website?

related questions