tags:

views:

29

answers:

2

Hello, I have some html-page. There is a javascript which generates some content. I have to parse this content from python-script. I have saved copy of file on the computer. Are there any ways to work with 'already generated' html? Like I can see in the browser after opening page-file. As I understand, I have to work with DOM (maybe, xml2dom lib).

A: 

I think you may have a fundamental misunderstanding in regards to what runs where: At the time JavaScript generates the content (on client side), the server side processing of the document has already taken place. There is no direct way for a server side Python script to access HTML created by JavaScript. Basically, that HTML lives only "virtually" in the browser's DOM.

You would have to find a way to transmit that HTML to your Python script. Most likely using Ajax. You would take the HTML, and add it as a parameter to your Ajax call (Remember to use POST as the request method so you don't get size limitation problems.)

An example using jQuery's AJAX functions:

$.ajax({ 
  url: "myscript.py", 
  type: "POST",
  data: { html: your_html_content_here },
  success: function(){
    alert("sent HTML to python script!");
  }});
Pekka
+2  A: 

Have you saved "the file" (web page, I imagine) before or after Javascript has altered it?

If "after", then it doesn't matter any more that some of the HTML was done via Javascript -- you can just use popular parsers like lxml or BeautifulSoup to handle the HTML you have.

If "before", then first you need to let Javascript do its work by automating a real browser; for that task, I would recommend SeleniumRC -- which brings you back to the "after" case;-).

Alex Martelli
+1 I think you got the question better than I did. I'm leaving my answer in place anyway in case somebody needs it.
Pekka
Yeah, 'before'. But my script should work almost every minute automatically. Can I implement this with SeleniumRC?
Ockonal
@Ockonal, if you have powerful-enough machines with lots of RAM, sure: with today's newest, fastest browsers, Javascript runs pretty fast, and Selenium adds little overhead to that.
Alex Martelli