views:

403

answers:

6

I need to call a web page that has javascript. At the bottom of the page I have the following:

  <noscript>
    <p>Javascript is not supported or enabled.</p>
  </noscript>

When I make my HttpWebRequest request like so, it is clear that the javascript on the page did not execute.

Dim req As System.Net.HttpWebRequest = DirectCast(System.Net.WebRequest.Create(New Uri(url)), System.Net.HttpWebRequest)
' Add the current authentication cookie to the request 
Dim cookie As HttpCookie = HttpContext.Current.Request.Cookies(FormsAuthentication.FormsCookieName)
Dim authenticationCookie As New System.Net.Cookie(FormsAuthentication.FormsCookieName, cookie.Value, cookie.Path, HttpContext.Current.Request.Url.Authority)

req.CookieContainer = New System.Net.CookieContainer()
req.CookieContainer.Add(authenticationCookie)
req.MediaType = "PRINT"
req.Method = "GET"
req.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"

Dim res As System.Net.WebResponse = req.GetResponse()

What can I do? The response is not useful to me if the javascript did not run. I want to convert the output into a PDF. I guess I need a way to execute the javascript that in included in the response, but do so outside of the browser.

Thanks.

A: 

Fix the page so it doesn't depend on JavaScript. Build on things that work.

David Dorward
+1  A: 

What output do you want to convert? You can only scrape the static HTML, not the JavaScript-modified DOM.

Remember that HttpWebRequest does not interpret JavaScript.

Daniel Vassallo
A: 

Javascript executes on the user-agent (client-side). You are providing a false user-agent string for the request. The user-agent you are "pretending" to be has a Javascript implementation. HttpWebRequest, of course, does not.

Josh Stodola
A: 

I guess I need a way to execute the javascript that in included in the response, but do so outside of the browser.

You'll need to write your own jasvascript interpreter then.

The only alternatives I can think about is using any web browser engine like webkit, gecko, etc. to render the page for you at the server-side or searching for online service like browsershots that will render the page for you.

Li0liQ
+1  A: 
  1. Use the HttpWebRequest as you have already did
  2. After GetResponse and GetResponseStream, save the stream content a temporary file (e.g. using filename from Path.GetTempFilename() method)
  3. Loads it up in The WebBrowser class.
  4. Lets the page executes itself for a while.
  5. Walk the web browser instance's representation of the DOM to get what you want.

Hope this helps.

chakrit
A: 

hey, I have the same problem, anywone has resolved this? can you give me a source code example? or some guidelines?

thanks,

eldo