views:

588

answers:

1

Hi, I am using selenium RC to cycle through a long list of URLs, sequentially writing the HTML from each URL to a csv file. Problem: the program frequently exits at various points in list due to URL "Timed out after 30000ms" exceptions. Instead of stopping the program when it hits a URL time-out, I was trying to have the program simply write a note of the time-out in the CSV file (in the row where the HTML for the URL would have gone) and move on to the next URL in the list. I attempted to add an 'else' clause to my program but it doesnt seem to help (see below) -- ie: the program still stops every time it hits a timeout. I also seem to get 30000ms timeout exceptions even when I open selenium-server with a 60000ms timeout window --eg: "java -jar selenium-server.jar -timeout 600000" ???

Any advice would be much appreciated. Thank you.

from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.MainDomain.com")
        self.selenium.start()

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
        for row in spamReader:
            sel.open(row[0])
            sel.wait_for_page_to_load("400000")
            time.sleep(5)
            html = sel.get_html_source()
            ofile = open('output4001-5000.csv', 'ab')
            ofile.write(html + '\n')
            ofile.close
        else:
            ofile = open('outputTest.csv', 'ab')
            ofile.write("URL Timeout" + '\n')
            ofile.close

     def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
     unittest.main()
+2  A: 

Try the following:

from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://example.com")
        self.selenium.start()
        self.selenium.set_timeout("60000")

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
        for row in spamReader:
            try:
                sel.open(row[0])
            except Exception, e:
                ofile = open('outputTest.csv', 'ab')
                ofile.write("error on %s: %s" % (row[0],e))
            else:
                time.sleep(5)
                html = sel.get_html_source()
                ofile = open('output4001-5000.csv', 'ab')
                ofile.write(html.encode('utf-8') + '\n')
            ofile.close()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
     unittest.main()

Some comments:

  • You don't need a wait_for_page_to_load after an open, that will cause you timeouts because once the page is loaded after the opeen, it will start waiting again and the page will not be loading.
  • Most of the failures you get from selenium (timeouts, object not found) can be caught with try-except statements
  • You should set the timeout in your tests withing the test itself (using set_timeout), that way it doesn't depend on the way you start the server, it will always wait the time you wanted
Santi
you da man - testing now.
KenBurnsFan1
I've got issues somewhere, but I'm sure this is the answer. I will continue tweaking tonight. Thanks!
KenBurnsFan1
Ups, the failure was caused for some identation issues, fixed now.I also added the encoding before writing the file, just in case your website has some unicode chars on it.
Santi
You hit the nail on the head. I spent the better half of the day trying to figure out how to escape non-ascii chars -- when I could have just looked back at your answer to this question. You da man!
KenBurnsFan1
Hah! Unicode and all it's madness.. Let's hope Python3 comes soon so we can forget about this kind of issues.
Santi