tags:

views:

50

answers:

1

I don't know how to automatically retrieve and save an image to my harddisk given this html source:

<img src="https://someRemoteServer/servlet/GetItemServlet?ts=32352.3&amp;itemtype=blabla"&gt;

I tried wget, but it only saves the request GetItemServlet itself to my harddisk.

I want to iterate through 700 images on the remote Server (which I do not own nor have backend access to) and save them all to my harddisk with an unique identifier.

Edit: The output of wget:

HTTP Request sent, waiting for Response... 200 OK
Länge: 0 [text/plain]
Save in »»GetItemServlet?ts=32352.3««.

The file itself is 0KB in size.

When wget is resolving the many parameters it prints:

[29] 48426
[30] 48427
--2010-08-16 21:52:02--  https://media.myRemoteServer.com/servlet/GetItemServlet?ts=56555
-bash: 1281978458512=1: command not found

but then continues

[2]   Done                    itemtype_text=[Keine+Auswahl]
..
[29]-  Done                    id=9
[30]+  Done                    res=2

Edit2:

After escaping ampersand and questionmarks wget does better and throws back a totally different error message:

File name too long

and

Cannot write to >GetItemServlet?ts=32352.3&itemtype=blabla< (Unknown error: 0) 

Edit3: Neverending Story. This should be at superuser anyway. I shortened the URL name and it runs through and indeed saves a file now. In this file which seems to be HTML it says my session run out and I need to logon. I did provide username and password with wget though. But when loggin on that site manually via Browser you have to do it through a forms-login. It seems like wget can login through a HTML form via a stored cookie: http://3.ly/cYqh. I will try that.

It's more than overdue to move this thread to Superuser:

http://3.ly/ukr9

+1  A: 

It should work just fine. Maybe leechers are autodetected and returned a different response. Since you didn't tell about the actual response in detail, it's as far a bit stabbing in the dark. Try supplying a legitimate useragent, or maintaining the session, or using a bit more smart 3rd party leeching tool.

That said, do you realize that most of the webmasters don't really appreciate those kind of actions? Network bandwidth and CPU load are not free.


Update as per your update: the name seems to be too long to be a legitimate save-as filename, also the ? is an illegal character in filenames (at least, in Windows). That might be the root cause of this all. I don't do wget, but you should at least specify a custom output file name. It'll be explained in its manual. This question is now probably a better fit for http://superuser.com.

BalusC
HTTP Request sent, waiting for Response... 200 OKLänge: 0 [text/plain]Save in »»GetImageServlet?ts=56555.2««.The file itself is 0KB in size. In the real world example there are a lot of parameters behind the first one "?item". This is official work, so I'm authorized to it.
Stephan Kristyn
I'm giving you the answer in response for your efforts and to close this thread.
Stephan Kristyn