views:

57

answers:

2

I've got the following code:

for f in fileListProtocol.files:
    if f['filetype'] == '-':
        filename = os.path.join(directory['filename'], f['filename'])
        print 'Downloading %s...' % (filename)
        newFile = open(filename, 'w+')
        d = ftpClient.retrieveFile(filename, FileConsumer(newFile))
        d.addCallback(closeFile, newFile)

Unfortunately, after downloading several hundred of the 1000+ files in the directory in question I get an IOError about too many open files. Why is this when I should be closing each file after they've been downloaded? If there's a more idiomatic way to approach the whole task of downloading lots of files too, I'd love to hear it. Thanks.

Update: Jean-Paul's DeferredSemaphore example plus Matt's FTPFile did the trick. For some reason using a Cooperator instead of DeferredSemaphore would download a few files and then fail because the FTP connection would have died.

+1  A: 

You're opening every file in fileListProtocol.files simultaneously, downloading contents to them, and then closing each when each download is complete. So, you have len(fileListProtocol.files) files open at the beginning of the process. If there are too many files in that list, then you'll try to open too many files.

You probably want to limit yourself to some fairly small number of parallel downloads at once (if FTP even supports parallel downloads, which I'm not entirely certain is the case).

http://jcalderone.livejournal.com/24285.html and http://stackoverflow.com/questions/2861858/queue-remote-calls-to-a-python-twisted-perspective-broker may be of some help in figuring out how to limit the number of downloads you start in parallel.

Jean-Paul Calderone
From looking at `FTPClientBasic` I was under the impression that it queued commands.
MattH
I had no idea that I was opening every file but now that I think about it that makes sense. Naturally I want to only have a few open. I'll look into your links.
pr1001
Jean-Paul, I tried to use your Cooperator code, which looks like exactly what I need, but unfortunately got the exact same problem. Would you say that the `work` iterator in `parallel` is opening all files simultaneously still? Do I need Matt's code also?
pr1001
It's hard to say without seeing all of the actual code. Matt's version has the advantage of only opening files when the server is ready to send them to you. This is cool, but it shouldn't strictly be necessary as long as you're keeping the total number of outstanding requests below the open file limit (minus some for however many random other files (and sockets!) you have open).
Jean-Paul Calderone
+1  A: 

Assuming that you're using FTPClient from twisted.protocols.ftp... and I certainly hesitate before contradicting JP..

It seems that the FileConsumer class you're passing to retrieveFile will be adapted to IProtocol by twisted.internet.protocol.ConsumerToProtocolAdapter, which doesn't call unregisterProducer, so FileConsumer doesn't close the file object.

I've knocked up a quick protocol that you can use to receive the files. I think it should only open the file when appropriate. Totally untested, you'd use it in place of FileConsumer in your code above and won't need the addCallback.

from twisted.python import log
from twisted.internet import interfaces
from zope.interface import implements

class FTPFile(object):
    """
    A consumer for FTP input that writes data to a file.

    @ivar filename: a filename to be opened for writing.
    """

    implements(interfaces.IProtocol)

    def __init__(self, filename):
        self.fObj = None
        self.filename = filename

    def makeConnection(self,transport)
        self.fObj = open(self.filename,'wb')
        log.info('Opened %s for writing' % self.filename)

    def connectionLost(self,reason):
        self.fObj.close()
        log.info('Closed %s' % self.filename)

    def dataReceived(self, bytes):
        self.fObj.write(bytes)
MattH
I don't see a contradiction here. :) Opening the files just in time makes sense if `FTPClient` is already serializing the operations. You'd still need to be careful if you're connecting to many different FTP servers though (or even if you're opening many different connections to a single FTP server).
Jean-Paul Calderone