views:

17

answers:

0

Hi, I am working on a script to download and process historical stock prices. When I used urllib.request.urlopen I got a strange prefix of text in every file (b'\xef\xbb\xbf) that was not present when I used urllib.request.urlretrieve, nor present when I typed the url into a browser (Firefox). So I have an answer but I don't know why it was causing a problem in the first place. I suspect it may be because I forced it to be a string, but I don't know why that is or how I would work around that (other than to use urlretrieve instead). The code is below. The relevant line is line 11. The commented code after is when I was using orlopen.

    #download a bunch of historical stock quotes from google finance

import urllib.request
symbolarray = []
symbolfile = open("symbols.txt")
for line in symbolfile:
    symbolarray.append(line.strip())
symbolfile.close()

for symbol in symbolarray:
    page = urllib.request.urlretrieve("http://www.google.com/finance/historical?q=NYSE:"+symbol+"&output=csv",symbol+".csv")
    #datafile = open(symbol+".csv","w")
    #datafile.write(str(page.read()))
    #datafile.close()