views:

224

answers:

2

I am using Python and the Twisted framework to connect to an FTP site to perform various automated tasks. Our FTP server happens to be Pure-FTPd, if that's relevant.

When connecting and calling the list method on an FTPClient, the resulting FTPFileListProtocol's files collection does not contain any directories or file names that contain a space (' ').

Has anyone else seen this? Is the only solution to create a sub-class of FTPFileListProtocol and override its unknownLine method, parsing the file/directory names manually?

+2  A: 

Firstly, if you're performing automated tasks on a retrieived FTP listing then you should probably be looking at NLST rather than LIST as noted in RFC 959 section 4.1.3:

 NAME LIST (NLST)
 ...
            This command is intended to return information that
            can be used by a program to further process the
            files automatically.

The Twisted documentation for LIST says:

It can cope with most common file listing formats.

This make me suspicious; I do not like solutions that "cope". LIST was intended for human consumption not machine processing.

If your target server supports them then you should prefer MLST and MLSD as defined in RFC 3659 section 7:

7.  Listings for Machine Processing (MLST and MLSD)

   The MLST and MLSD commands are intended to standardize the file and
   directory information returned by the server-FTP process.  These
   commands differ from the LIST command in that the format of the
   replies is strictly defined although extensible.

However, these newer commands may not be available on your target server and I don't see them in Twisted. Therefore NLST is probably your best bet.

As to the nub of your problem, there are three likely causes:

  1. The processing of the returned results is incorrect (Twisted may be at fault, as you suggest, or perhaps elsewhere)
  2. The server is buggy and not sending a correct (complete) response
  3. The wrong command is being sent (unlikely with straight NLST/LIST, but some servers react differently if arguments are supplied to these commands)

You can eliminate (2) and (3) and prove that the cause is (1) by looking at what is sent over the wire. If this option is not available to you as part of the Twisted API or the Pure-FTPD server logging configuration, then you may need to break out a network sniffer such as tcpdump, snoop or WireShark (assuming you're allowed to do this in your environment). Note that you will need to trace not only the control connection (port 21) but also the data connection (since that carries the results of the LIST/NLST command). WireShark is nice since it will perform the protocol-level analysis for you.

Good luck.

Martin Carpenter
I'm giving you the accepted answer, mostly because you are correct that I should probably be using NLST instead of LIST. Thanks.
Ryan Duffield
A: 

This is somehow expected. FTPFileListProtocol isn't able to understand every FTP output, because, well, some are wacky. As explained in the docstring:

If you need different evil for a wacky FTP server, you can override either C{fileLinePattern} or C{parseDirectoryLine()}.

In this case, it may be a bug: maybe you can improve fileLinePattern and makes it understand filename with spaces. If so, you're welcome to open a bug in the Twisted tracker.

Thomas Hervé