views:

32

answers:

2

I'm not sure how to find this information, I have found a few tutorials so far about using Python with selenium but none have so much as touched on this.. I am able to run some basic test scripts through python that automate selenium but it just shows the browser window for a few seconds and then closes it.. I need to get the browser output into a string / variable (ideally) or at least save it to a file so that python can do other things on it (parse it, etc).. I would appreciate if anyone can point me towards resources on how to do this. Thanks

+1  A: 

Ok, so here is how I ended up doing this, for anyone who needs this in the future..

You have to use firefox for this to work.

1) create a new firefox profile (not necessary but ideal so as to separate this from normal firefox usage), there is plenty of info on how to do this on google, it depends on your OS how you do this

2) get the firefox plugin: https://addons.mozilla.org/en-US/firefox/addon/2704/ (this automatically saves all pages for a given domain name), you need to configure this to save whichever domains you intend on auto-saving.

3) then just start the selenium server to use the profile you created (below is an example for linux)

cd /root/Downloads/selenium-remote-control-1.0.3/selenium-server-1.0.3 
java -jar selenium-server.jar -firefoxProfileTemplate /path_to_your_firefox_profile/

Thats it, it will now save all the pages for a given domain name whenever selenium visits them, selenium does create a bunch of garbage pages too so you could just delete these via a simple regex parsing and its up to you, from there how to manipulate the saved pages

Rick
+2  A: 

There's a Selenium.getHtmlSource() method in Java, most likely it is also available in Python. It returns the source of the current page as string, so you can do whatever you want with it

ZloiAdun
Yes, I found this info the selenium rc python docs, I had forgot to look there as i had been looking at some other selenium API which I guess was outdated or something as it didn't seem to have a method for this, but the simple answer to this is to just look in the selenium rc docs under the language =) , its a command like this: ret = sel.get_string("getBodyText", [])
Rick