views:

32

answers:

1

Now I need a solution for getting HTML content from a browser. As rendering in a browser, js will be ran, and if not, js won't be ran. So any html libraries like lxml, beautifulsoup and others are all not gonna work. I've searched a project named pywebkitgtk, but it's purpose is to create a browser with a front end. Is there any way to put a url into a "fake browser" and render it and run its all javascript and save it into a html file? I don't need any front-end, just back-end is ok. Thanks, guys.

Almost forget, I need to use Python or java to do that.

+3  A: 

selenium-rc lets you drive an actual browser for your purpose, under control of any of several languages at your choice, which include both Python and Java. Check it out!

For a detailed example of use with Python, see here.

Alex Martelli
I don't get it. This Selenium RC accepts a url and return a html which is rendered by any browser I choose, am I right?
davidx
@davidx, that's just the start -- the rendering includes JS execution, and then you can get the resulting page's body as HTML with the `get_html_source` method of the selenium class (even that is just the start since you can interact with the page as and if needed, send mouse clicks, etc, etc -- but I gather that all you want is for the JS to execute upon loading, then get the HTML source, and selenium-rc makes that really easy with any of the many browsers it lets you control).
Alex Martelli
that is great, thanks, Alex!
davidx