ansaurus

Question

How to retrieve a directory of files from a remote server?

Answer 1

+3 A:

If the webserver has directory browsing enabled, it will return a HTML document with links to all the files. You could parse the HTML document and extract all the links. This would give you the list of files.

You can use the HTMLParser class to extract the elements you're interested in. Something like this will work:

from HTMLParser import HTMLParser
import urllib

class AnchorParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
            if tag =='a':
                    for key, value in attrs:
                            if key == 'href':
                                    print value

parser = AnchorParser(HTMLParser)
data = urllib.urlopen('http://somewhere').read()
parser.feed(data)

Robert Christie 2009-11-09 08:29:24

That does the trick indeed. Thanks for the suggestion!

tomlog 2009-11-09 09:15:34

Answer 2

A:

Why don't you use curl or wget to recursively download the given page, and limit it upto 1 level. You will save all the trouble of writing the script.

e.g. something like

wget -H -r --level=1 -k -p www.yourpage/dir

Anurag Uniyal 2009-11-09 08:35:38

I want to use the retrieved files in my Python code, so it's easier for me to script it.

tomlog 2009-11-09 08:52:47

ansaurus

tags:

views:

answers:

How to retrieve a directory of files from a remote server?

related questions