views:

188

answers:

4

I am trying to to get all directories' name from an FTP server and store them in hierarchical order in a multidimensional list or dict

So for example, a server that contains the following structure:

/www/
    mysite.com
        images
            png
            jpg

at the end of the script, would give me a list such as

['/www/'
  ['mysite.com'
    ['images'
      ['png'],
      ['jpg']
    ]
  ]
]

I have tried using a recursive function like so: def traverse(dir): FTP.dir(dir, traverse)

FTP.dir returns lines in this format:

drwxr-xr-x    5 leavesc1 leavesc1     4096 Nov 29 20:52 mysite.com

so doing line[56:] will give me just the directory name(mysite.com). I use this in the recursive function.

But i cannot get it to work. I've tried many different approaches and can't get it to work. Lots of FTP errors as well (either can't find the directory - which is a logical issue, and sometimes unexpected errors returned by the server, which leaves no log and i can't debug)

bottom line question: How to get a hierarchical directory listing from an FTP server?

+3  A: 

Here is a naive and slow implementation. It is slow because it tries to CWD to each directory entry to determine if it is a directory or a file, but this works. One could optimize it by parsing LIST command output, but this is strongly server-implementation dependent.

import ftplib

def traverse(ftp, depth=0):
    """
    return a recursive listing of an ftp server contents (starting
    from the current directory)

    listing is returned as a recursive dictionary, where each key
    contains a contents of the subdirectory or None if it corresponds
    to a file.

    @param ftp: ftplib.FTP object
    """
    if depth > 10:
        return ['depth > 10']
    level = {}
    for entry in (path for path in ftp.nlst() if path not in ('.', '..')):
        try:
            ftp.cwd(entry)
            level[entry] = traverse(ftp, depth+1)
            ftp.cwd('..')
        except ftplib.error_perm:
            level[entry] = None
    return level

def main():
    ftp = ftplib.FTP("localhost")
    ftp.connect()
    ftp.login()
    ftp.set_pasv(True)

    print traverse(ftp)

if __name__ == '__main__':
    main()
abbot
Thanks so much for the reply! wish i could accept it 5x! :) I just realized though, that although this works, it doesn't work all the way. I have 18 directories in my root dir. This script successfully traversed 7 of them. The rest were just not added to the level dict.
lyrae
You can check the first character of a line - `d` in `drwxr-xr-x` means it's a directory. If it's not, simply skip it. Another approach would be to parse the output of LIST -R
LiraNuna
@LiraNuna: actually you can't. The first character can be also an l meaning that this is a symlink ether to a file or to a directory. And in general it is strongly server-dependent by the definition of the FTP protocol.
abbot
how about using MLSD? it says "type=dir" or "type=file". However, i can't find a way to use ftp.retrlines() (regardless of first argument) instead of ftp.nlst(). Mainly because ftp.nlst() returns a list of elements, as ftp.retrlines() prints to sys.stdout. I have tried passing the output to a function to store in a global list, and loop through that list in the traverse function, ie: 'for entry in list: etc' but it does not work. If anyone would like to help, it would be appreciated :) thanks again to all of you!
lyrae
@lyrae: check [that](http://stackoverflow.com/questions/2867217/how-to-delete-files-with-a-python-script-from-a-ftp-server-which-are-older-than-7/3114477#3114477) answer of mine in a related question, where I used a class method as the callback.
ΤΖΩΤΖΙΟΥ
+1  A: 

You're not going to like this, but "it depends on the server" or, more accurately, "it depends on the output format of the server".

Different servers can be set to display different output, so your initial proposal is bound to failure in the general case.

The "naive and slow implementation" above will cause enough errors that some FTP servers will cut you off (which is probably what happened after about 7 of them...).

Paul Hoffman
Most likely.is there, then, another way of grabbing a site's directory structure?
lyrae
A: 

If we are using Python look at:

http://docs.python.org/library/os.path.html (os.path.walk)

If there already is a good module for this, don't reinvent the wheel. Can't believe the post two spots above got two ups, anyway, enjoy.

Anders
You can't do that on an FTP server.
lyrae
Ofcourse you can, just add it as a path.
Anders
+1  A: 

If the server supports the MLSD command, then use the “a directory and its descendants” code from that answer.

ΤΖΩΤΖΙΟΥ
many many thanks for this.
lyrae