wget is helpful in my data mining projects. Today I try to wget the following web. Its content-type is unspecified, so the connection is hung before I terminate the process. I tried options for -T , --connect-timeout --read-timeout and --no-http-keep-alive, all failed. I try to google the answer, read the man of wget. No solution. Someone hints that issue maybe a bug for specific versions. I don't know. I post my question here, just want to make sure someone knows it.
BTW, my os is Ubuntu 10.04 LTS Lucid Lynx for i386.
wget --connect-timeout 3 --read-time 3 --debug http://www.crvanguard.com.cn/custom/crv/sales/hb.jsp?province=101&city=1010001&shop=0&sale_type=0&pageNo=1
Here is the debug info (some Chinese in the debug info has been translated into English):
DEBUG output created by Wget 1.12 on linux-gnu.
--2010-06-17 19:18:29-- http://www.crvanguard.com.cn/custom/crv/sales/hb.jsp?province=101
Resolving host www.crvanguard.com.cn... 219.134.63.193
Caching www.crvanguard.com.cn => 219.134.63.193
Connecting www.crvanguard.com.cn|219.134.63.193|:80... connected。
Created socket 3.
Releasing 0x09b79090 (new refcount 1).
---request begin---
GET /custom/crv/sales/hb.jsp?province=101 HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.crvanguard.com.cn
Connection: Keep-Alive
---request end---
HTTP request sent, waiting for response...
---response begin---
HTTP/1.1 200 OK
Date: Thu, 17 Jun 2010 11:09:10 GMT
Server: IBM_HTTP_Server
Surrogate-Control: no-store
Set-Cookie: JSESSIONID=0000I2ewO_IHpH5Kly3d8DKm6vn:-1; Path=/
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Cache-Control: no-cache="set-cookie, set-cookie2"
Connection: close
Content-Type: text/html; charset=GBK
Content-Language: zh-CN
---response end---
200 OK
Stored cookie www.crvanguard.com.cn -1 (ANY) / [expiry none] JSESSIONID 0000I2ewO_IHpH5Kly3d8DKm6vn:-1
Content-length: unspecified [text/html]
Saving to: “hb.jsp?province=101.1”
[ ] 157,669 210K/s in 0.7s
Closed fd 3
2010-06-17 19:18:29 (210 KB/s) - “hb.jsp?province=101.1” saved [157669]
^C
[10] Done wget --connect-timeout 3 --read-time 3 --debug http://www.crvanguard.com.cn/custom/crv/sales/hb.jsp?province=101
[11] Done city=1010001
[12] Done shop=0
[13] Done sale_type=0
It seems that wget can not successfully disconnect the link even I specified the timeout parameters and prevent http-keep-alive options.
Am I using the wrong options? Is it a bug? Thanks in advance.