ansaurus

Question

Python Web-Scrape Loop via CSV list of URLs ???

Answer 1

+1 A:

When I copied your routine, I did get a white space / tab error error. Check your tabs. You were indexing into the URL string incorrectly using your loop counter. This would have also messed you up.

Also, you don't really need to control the loop with a counter. This will loop for each line entry in your CSV file.

#Python v2.6.2

import csv 
import urllib2
import re

urls = csv.reader(open('list.csv'))
for url in urls:
    response = urllib2.urlopen(url[0])
    html = response.read()
    print re.findall('td7.*?td',html)

Lastly, be sure that your URLs are properly formed:

http://www.cnn.com
http://www.fark.com
http://www.cbc.ca

gdc 2009-09-18 01:09:04

Thanks! I was trying this approach before but only got one return per list value -- ie: [0] only yielded col7 data for the first URL , [1] only yielded col7 data for the second, etc. Your second note sealed it: Looks like my URLs were in the wrong format -- eg: http://www.cnn.com,http://www.fark.com,http://www.cbc.ca -- it worked once I changed to your format. Looks like I need to read more about proper Python/CSV formatting. Thanks again!

KenBurnsFan1 2009-09-18 01:32:48

Also, nice to receive help from a Cannuck! My Mother's side reigns from SaltSpring Island / Vancouver / Victoria areas -- I was very tempted to attend UVIC. BC is crazy beautiful.

KenBurnsFan1 2009-09-18 01:44:51

ansaurus

tags:

views:

answers:

Python Web-Scrape Loop via CSV list of URLs ???

related questions