How can I get html content from a brower that can do the html correction and js scripting?

views:

answers:

How can I get html content from a brower that can do the html correction and js scripting?

Now I need a solution for getting HTML content from a browser. As rendering in a browser, js will be ran, and if not, js won't be ran. So any html libraries like lxml, beautifulsoup and others are all not gonna work. I've searched a project named pywebkitgtk, but it's purpose is to create a browser with a front end. Is there any way to put a url into a "fake browser" and render it and run its all javascript and save it into a html file? I don't need any front-end, just back-end is ok. Thanks, guys.

Almost forget, I need to use Python or java to do that.

+3 A:

selenium-rc lets you drive an actual browser for your purpose, under control of any of several languages at your choice, which include both Python and Java. Check it out!

For a detailed example of use with Python, see here.

Alex Martelli 2010-07-29 02:53:11

I don't get it. This Selenium RC accepts a url and return a html which is rendered by any browser I choose, am I right?

davidx 2010-07-29 03:21:11

@davidx, that's just the start -- the rendering includes JS execution, and then you can get the resulting page's body as HTML with the `get_html_source` method of the selenium class (even that is just the start since you can interact with the page as and if needed, send mouse clicks, etc, etc -- but I gather that all you want is for the JS to execute upon loading, then get the HTML source, and selenium-rc makes that really easy with any of the many browsers it lets you control).

Alex Martelli 2010-07-29 04:12:39

that is great, thanks, Alex!

davidx 2010-07-29 05:38:02

ansaurus

tags:

views:

answers:

How can I get html content from a brower that can do the html correction and js scripting?

related questions