ansaurus

Question

How can I programmatically get the image on this page?

Answer 1

+2 A:

If you will inspect the page's source code, there's a link with img inside, that contains the image of earth. For example:

<img 
 src="/cgi-bin/Earth?di=570C6ABB1F33F13E95631EFF088262D5E20F2A10190A5A599229" 
 ismap="ismap" usemap="#zoommap" width="320" height="320" border="0" alt="" />

Without giving the 'di' parameter, you are just asking for whole web page, with references to this image, not for the image itself.

Edit: 'Di' parameter encodes which "part" of the earth you want to receive, anyway, try for example

wget http://www.fourmilab.ch/cgi-bin/Earth?di=F5AEC312B69A58973CCAB756A12BCB7C47A9BE99E3DDC5F63DF746B66C122E4E4B28ADC1EFADCC43752B45ABE2585A62E6FB304ACB6354E2796D9D3CEF7A1044FA32907855BA5C8F

Ravadre 2009-09-03 11:09:34

Yes, I saw that too. But, as I said in my comment to Brad's response, this di value is changing almost on every page refresh. So, the question is: how do I find out the image URL before I can wget it?

Harry 2009-09-03 11:22:04

and you get a file with a long name 'Earth\?di\=F5AEC...' which is a jpeg. Very clever!

pavium 2009-09-03 11:26:38

@somedeveloper: Di changes only when you change polar coordinates at which you want to get the image. If you want to get some specific coords, you should parse the result from the first page, which you actually did 2 answers below.@pavium: File names can be changed, most programs can stream data to stdout, and from that - it can be redirected to any file or device, so this isn't such problem.

Ravadre 2009-09-03 12:26:10

Answer 2

+1 A:

Use GET instead of POST. They're completely different for the CGI program in the background.

Brad Bruce 2009-09-03 11:12:17

Answer 3

A:

What you are downloading is the whole HTML page and not the image. To download the image and other elements too, you'll need to use the --page-requisites (and possibly --convert-links) parameter(s). Unfortunately because robots.txt disallows access to URLs under /cgi-bin/, wget will not download the image which is located under /cgi-bin/. AFAIK there's no parameter to disable the robots protocol.

Cristian Ciupitu 2009-09-03 11:13:42

Answer 4

+1 A:

Following on from Ravadre,

wget -p http://www.fourmilab.ch/cgi-bin/Earth

downloads an XHTML file which contain an <img> tag.

I edited the XHTML to remove everything but the img tag and turned it into a bash script containing another wget -p command, escaping the ? and =

When I executed this I got a 14kB file which I renamed earth.jpg

Not really programmatic, the way I did it, but I think it could be done.

But as @somedeveloper said, the di value is changing (since it depends on time).

pavium 2009-09-03 11:38:50

Thanks, I'll wrap that up in a script.

Harry 2009-09-03 11:56:35

Answer 5

A:

Guys, here's what I finally did. Not fully happy with this solution, as I was (and am still) hoping for a better way... one that gets the image on the first wget itself... giving me the same user experience I get when browsing via firefox.

#!/bin/bash

tmpf=/tmp/delme.jpeg
base=http://www.fourmilab.ch
liveurl=$(wget -O - $base/cgi-bin/Earth?opt=-p 2>/dev/null | perl -0777 -nle 'if(m@<img \s+ src \s* = \s* "(/cgi-bin/Earth\?di= .*? )" @gsix) { print "$1\n" }' )
wget -O $tmpf $base/$liveurl &>/dev/null

Harry 2009-09-03 12:00:30

ansaurus

tags:

views:

answers:

How can I programmatically get the image on this page?

related questions