tags:

views:

170

answers:

1

I need to download a CSV file, which works fine in browsers using:

http://www.ftse.com/objects/csv_to_csv.jsp?infoCode=100a&theseFilters=&csvAll=&theseColumns=Mw==&theseTitles=&tableTitle=FTSE%20100%20Index%20Constituents&dl=&p_encoded=1&e=.csv

The following code works for any other file (url) (with a fully qualified path), however with the above URL is downloads 800 bytes of gibberish.

def getFile(self,URL):

    proxy_support = urllib2.ProxyHandler({'http': 'http://proxy.REMOVED.com:8080/'})
    opener = urllib2.build_opener(proxy_support)
    urllib2.install_opener(opener)
    response = urllib2.urlopen(URL)
    print response.geturl()
    newfile = response.read()
    output = open("testFile.csv",'wb')
    output.write(newfile)
    output.close()
+1  A: 

urllib2 uses httplib under the hood, so the best way to diagnose this is to turn on http connection debugging. Add this code before you access the url and you should get a nice summary of exactly what http traffic is being generated:

import httplib
httplib.HTTPConnection.debuglevel = 1
ataylor