BeautifulSoup get innerhtml data

views:

answers:

+2 Q:

BeautifulSoup get innerhtml data

I am trying to read data from a website. I can see the value I need but the value does not appear in the downloaded html code (using urllib2). The value is created by some js file and embedded into the webpage as innerhtml for that id. PS: How can that be extracted? raw source code cannot render js unlike the browsers!

+1 A:

You have two options: Have the browser save the DOM (this includes all changes made by scripts) or use a JavaScript engine to execute the embedded scripts.

For the latter route, try a Java based engine like Rhino and emulate the browser with env.js.

Aaron Digulla 2010-07-08 08:34:22

How do I automatically save the browser's DOM? Thanks for the emulator method, but do you know of a pythonic way of doing this?

zubinmehta 2010-07-08 08:58:29

Try Selenium as suggested by mamoo.

Aaron Digulla 2010-07-08 14:50:06

+4 A:

Another way of getting data is leaving the browser do all the stuff using Selenium and read the rendered html. A bit slow but surely effective.

Here you can find a getting started guide for using Selenium with Python: http://jimmyg.org/blog/2009/getting-started-with-selenium-and-python.html

mamoo 2010-07-08 09:34:18

ansaurus

tags:

views:

answers:

BeautifulSoup get innerhtml data

related questions