views:

710

answers:

1

Hi, I'm trying to figure out how to apply a for-loop to this script and I'm having a lot of trouble. I want to iterate through a list of subdomains which are stored in csv format (ie: one column with 20 subdomains) and print the html for each. They all have the same SourceDomain. Thanks!

#Python 2.6
from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.SourceDomain.com")
        self.selenium.start()

    def test_untitled(self):
        sel = self.selenium
        sel.open("/dns/www.subdomains.com.html")
        sel.wait_for_page_to_load("30000")
        html = sel.get_html_source()
        print html

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()
+3  A: 
#Python 2.6
from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.SourceDomain.com")
        self.selenium.start()

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('your_file.csv'))
        for row in spamReader:
            sel.open(row[0])
            sel.wait_for_page_to_load("30000")
            print sel.get_html_source()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()

BTW, notice there's no need to place this script wrapped inside a unittest testcase. Even better, you don't need selenium for such a simple task (at least at first sight).

Try this:

import urllib2, csv

def fetchsource(url):
    page = urllib2.urlopen(url)
    source = page.read()
    return source

fooReader = csv.reader(open('your_file.csv'))
for url in fooReader:
    print fetchsource(url)
Santi
Thanks - I'm trying to test your first answer now. I couldnt get urllib2 to work because these pages use a LOT of javaScript. For which, Alex Martelli advised me to use Selenium.
KenBurnsFan1
I kept getting syntax error because I forgot the second closing bracket on (open('your_file.csv')) :-P It works! Thank you!
KenBurnsFan1
Ah, that's one of the small reasons for which you would use selenium in thins kind of tasks. Glad to see it helped.
Santi