tags:

views:

247

answers:

5

The URL http://www.fourmilab.ch/cgi-bin/Earth shows a live map of the Earth.

If I issue this URL in my browser (FF), the image shows up just fine. But when I try 'wget' to fetch the same page, I fail!

Here's what I tried first:

wget -p http://www.fourmilab.ch/cgi-bin/Earth

Thinking, that probably all other form fields are required too, I did a 'View Source' on the above page, noted down the various field values, and then issued the following URL:

wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth

Still no image!

Can someone please tell me what is going on here...? Are there any 'gotchas' with CGI and/or form-POST based wgets? Where (book or online resource) would such concepts be explained?

+2  A: 

If you will inspect the page's source code, there's a link with img inside, that contains the image of earth. For example:

<img 
 src="/cgi-bin/Earth?di=570C6ABB1F33F13E95631EFF088262D5E20F2A10190A5A599229" 
 ismap="ismap" usemap="#zoommap" width="320" height="320" border="0" alt="" /> 

Without giving the 'di' parameter, you are just asking for whole web page, with references to this image, not for the image itself.

Edit: 'Di' parameter encodes which "part" of the earth you want to receive, anyway, try for example

wget http://www.fourmilab.ch/cgi-bin/Earth?di=F5AEC312B69A58973CCAB756A12BCB7C47A9BE99E3DDC5F63DF746B66C122E4E4B28ADC1EFADCC43752B45ABE2585A62E6FB304ACB6354E2796D9D3CEF7A1044FA32907855BA5C8F

Ravadre
Yes, I saw that too. But, as I said in my comment to Brad's response, this di value is changing almost on every page refresh. So, the question is: how do I find out the image URL before I can wget it?
Harry
and you get a file with a long name 'Earth\?di\=F5AEC...' which is a jpeg. Very clever!
pavium
@somedeveloper: Di changes only when you change polar coordinates at which you want to get the image. If you want to get some specific coords, you should parse the result from the first page, which you actually did 2 answers below.@pavium: File names can be changed, most programs can stream data to stdout, and from that - it can be redirected to any file or device, so this isn't such problem.
Ravadre
+1  A: 

Use GET instead of POST. They're completely different for the CGI program in the background.

Brad Bruce
A: 

What you are downloading is the whole HTML page and not the image. To download the image and other elements too, you'll need to use the --page-requisites (and possibly --convert-links) parameter(s). Unfortunately because robots.txt disallows access to URLs under /cgi-bin/, wget will not download the image which is located under /cgi-bin/. AFAIK there's no parameter to disable the robots protocol.

Cristian Ciupitu
+1  A: 

Following on from Ravadre,

wget -p http://www.fourmilab.ch/cgi-bin/Earth

downloads an XHTML file which contain an <img> tag.

I edited the XHTML to remove everything but the img tag and turned it into a bash script containing another wget -p command, escaping the ? and =

When I executed this I got a 14kB file which I renamed earth.jpg

Not really programmatic, the way I did it, but I think it could be done.

But as @somedeveloper said, the di value is changing (since it depends on time).

pavium
Thanks, I'll wrap that up in a script.
Harry
A: 

Guys, here's what I finally did. Not fully happy with this solution, as I was (and am still) hoping for a better way... one that gets the image on the first wget itself... giving me the same user experience I get when browsing via firefox.

#!/bin/bash

tmpf=/tmp/delme.jpeg
base=http://www.fourmilab.ch
liveurl=$(wget -O - $base/cgi-bin/Earth?opt=-p 2>/dev/null | perl -0777 -nle 'if(m@<img \s+ src \s* = \s* "(/cgi-bin/Earth\?di= .*? )" @gsix) { print "$1\n" }' )
wget -O $tmpf $base/$liveurl &>/dev/null
Harry