I have to download some files from an FTP server. Seems prosaic enough. However, the way this server behaves is if the file is very large, the connection will just hang when the download ostensibly completes.
How can I handle this gracefully using ftplib in python?
Sample python code:
from ftplib import FTP
...
ftp = FTP(host)
ftp.login(login, passwd)
files=ftp.nlst()
ftp.set_debuglevel(2)
for fname in files:
ret_status = ftp.retrbinary('RETR ' + fname, open(fname, 'wb').write)
debug output from the above:
*cmd* 'TYPE I'
*put* 'TYPE I\r\n'
*get* '200 Type set to I.\r\n'
*resp* '200 Type set to I.'
*cmd* 'PASV'
*put* 'PASV\r\n'
*get* '227 Entering Passive Mode (0,0,0,0,10,52).\r\n'
*resp* '227 Entering Passive Mode (0,0,0,0,10,52).'
*cmd* 'RETR some_file'
*put* 'RETR some_file\r\n'
*get* '125 Data connection already open; Transfer starting.\r\n'
*resp* '125 Data connection already open; Transfer starting.'
[just sits there indefinitely]
This is what it looks like when I attempt the same download using curl -v:
* About to connect() to some_server port 21 (#0)
* Trying some_ip... connected
* Connected to some_server (some_ip) port 21 (#0)
< 220 Microsoft FTP Service
> USER some_user
< 331 Password required for some_user.
> PASS some_password
< 230 User some_user logged in.
> PWD
< 257 "/some_dir" is current directory.
* Entry path is '/some_dir'
> EPSV
* Connect data stream passively
< 500 'EPSV': command not understood
* disabling EPSV usage
> PASV
< 227 Entering Passive Mode (0,0,0,0,11,116).
* Trying some_ip... connected
* Connecting to some_ip (some_ip) port 2932
> TYPE I
< 200 Type set to I.
> SIZE some_file
< 213 229376897
> RETR some_file
< 125 Data connection already open; Transfer starting.
* Maxdownload = -1
* Getting file with size: 229376897
{ [data not shown]
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 218M 100 218M 0 0 182k 0 0:20:28 0:20:28 --:--:-- 0* FTP response timeout
* control connection looks dead
100 218M 100 218M 0 0 182k 0 0:20:29 0:20:29 --:--:-- 0* Connection #0 to host some_server left intact
curl: (28) FTP response timeout
* Closing connection #0
wget output is kind of interesting as well, it notices the connection is dead, then attempts to re-download the file which only confirms that it is already finished:
--2009-07-09 11:32:23-- ftp://some_server/some_file
=> `some_file'
Resolving some_server... 0.0.0.0
Connecting to some_server|0.0.0.0|:21... connected.
Logging in as some_user ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD not needed.
==> SIZE some_file ... 229376897
==> PASV ... done. ==> RETR some_file ... done.
Length: 229376897 (219M)
100%[==========================================================>] 229,376,897 387K/s in 18m 54s
2009-07-09 11:51:17 (198 KB/s) - Control connection closed.
Retrying.
--2009-07-09 12:06:18-- ftp://some_server/some_file
(try: 2) => `some_file'
Connecting to some_server|0.0.0.0|:21... connected.
Logging in as some_user ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD not needed.
==> SIZE some_file ... 229376897
==> PASV ... done. ==> REST 229376897 ... done.
==> RETR some_file ... done.
Length: 229376897 (219M), 0 (0) remaining
100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++] 229,376,897 --.-K/s in 0s
2009-07-09 12:06:18 (0.00 B/s) - `some_file' saved [229376897]