views:

448

answers:

2

I have to download some files from an FTP server. Seems prosaic enough. However, the way this server behaves is if the file is very large, the connection will just hang when the download ostensibly completes.

How can I handle this gracefully using ftplib in python?

Sample python code:

from ftplib import FTP

...

ftp = FTP(host)
ftp.login(login, passwd)
files=ftp.nlst()
ftp.set_debuglevel(2)

for fname in files:
    ret_status = ftp.retrbinary('RETR ' + fname, open(fname, 'wb').write)

debug output from the above:

*cmd* 'TYPE I'
*put* 'TYPE I\r\n'
*get* '200 Type set to I.\r\n'
*resp* '200 Type set to I.'
*cmd* 'PASV'
*put* 'PASV\r\n'
*get* '227 Entering Passive Mode (0,0,0,0,10,52).\r\n'
*resp* '227 Entering Passive Mode (0,0,0,0,10,52).'
*cmd* 'RETR some_file'
*put* 'RETR some_file\r\n'
*get* '125 Data connection already open; Transfer starting.\r\n'
*resp* '125 Data connection already open; Transfer starting.'
[just sits there indefinitely]

This is what it looks like when I attempt the same download using curl -v:

* About to connect() to some_server port 21 (#0)
*   Trying some_ip... connected
* Connected to some_server (some_ip) port 21 (#0)
< 220 Microsoft FTP Service
> USER some_user
< 331 Password required for some_user.
> PASS some_password
< 230 User some_user logged in.
> PWD
< 257 "/some_dir" is current directory.
* Entry path is '/some_dir'
> EPSV
* Connect data stream passively
< 500 'EPSV': command not understood
* disabling EPSV usage
> PASV
< 227 Entering Passive Mode (0,0,0,0,11,116).
*   Trying some_ip... connected
* Connecting to some_ip (some_ip) port 2932
> TYPE I
< 200 Type set to I.
> SIZE some_file
< 213 229376897
> RETR some_file
< 125 Data connection already open; Transfer starting.
* Maxdownload = -1
* Getting file with size: 229376897
{ [data not shown]
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  218M  100  218M    0     0   182k      0  0:20:28  0:20:28 --:--:--     0* FTP response timeout
* control connection looks dead
100  218M  100  218M    0     0   182k      0  0:20:29  0:20:29 --:--:--     0* Connection #0 to host some_server left intact

curl: (28) FTP response timeout
* Closing connection #0

wget output is kind of interesting as well, it notices the connection is dead, then attempts to re-download the file which only confirms that it is already finished:

--2009-07-09 11:32:23--  ftp://some_server/some_file
           => `some_file'
Resolving some_server... 0.0.0.0
Connecting to some_server|0.0.0.0|:21... connected.
Logging in as some_user ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD not needed.
==> SIZE some_file ... 229376897
==> PASV ... done.    ==> RETR some_file ... done.
Length: 229376897 (219M)

100%[==========================================================>] 229,376,897  387K/s   in 18m 54s 

2009-07-09 11:51:17 (198 KB/s) - Control connection closed.
Retrying.

--2009-07-09 12:06:18--  ftp://some_server/some_file
  (try: 2) => `some_file'
Connecting to some_server|0.0.0.0|:21... connected.
Logging in as some_user ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD not needed.
==> SIZE some_file ... 229376897
==> PASV ... done.    ==> REST 229376897 ... done.    
==> RETR some_file ... done.
Length: 229376897 (219M), 0 (0) remaining

100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++] 229,376,897 --.-K/s   in 0s      

2009-07-09 12:06:18 (0.00 B/s) - `some_file' saved [229376897]
A: 

I've never used ftplib, but perhaps you could do:

  1. Get the name and size of the file you want.
  2. Start a new daemonic thread to download the file.
  3. In the main thread, check every few seconds whether the file size on disk equals the target size.
  4. When it does, wait a few seconds to give the connection a chance to close nicely, and then exit the program.
John Fouhy
A: 

I think some debugging could be useful. Could you fold the class below into your code? (I didn't do it myself because I know this version works, and didn't want to risk making an error. You should be able to just put the class at the top of your file and replace the body of the loop with what I've written after #LOOP BODY)

class CounterFile():
    def __init__(self, file, maxsize):
        self.file = file
        self.count = 0
        self.maxsize = maxsize

    def write(self, bytes):
        self.count += len(bytes)
        print "total %d bytes / %d"%(self.count, self.maxsize)
        if self.count == self.maxsize:
            print "   Should be complete"
        self.file.write(bytes)


from ftplib import FTP
ftp = FTP('ftp.gimp.org')
ftp.login('ftp', '[email protected]')
ftp.set_debuglevel(2)

ftp.cwd('/pub/gimp/v2.6/')
fname = 'gimp-2.6.2.tar.bz2'

# LOOP BODY
sz = ftp.size(fname)
if sz is None:
    print "Could not get size!"
    sz = 0
ret_status = ftp.retrbinary('RETR ' + fname, CounterFile(open(fname, 'wb'), sz).write)
thouis