views:

130

answers:

4

Hi All,

I am writing a small webspider for a website which uses a lot of javascript for links:

<htmlTag onclick="someFunction();">Click here</htmlTag>

where the function looks like:

function someFunction() {
  var _url;
  ...
  // _url constructed, maybe with reference to a value in the HTML doc
  // and/or a value passed as argument(s) to this function
  ...
  window.location.href = _url;
}

What is the best way of evaluating this function server-side so I can construct the value for _url?

A: 

Not exactly sure what you're trying to accomplish.

If you need to send these values to the server for processing, Ajax would be your best option.

Chris Pebble
Simply spider a site, and download all pages. This is easy with <a href=..>. However, the site I need to do uses a lot of onclicks. So, server side, i need to evaluate the onlick function and get the URL, so i can GET that page
A: 

It should be a mess to do. But it depends on a lot of params:

  1. Where does the link is stored ? inside the element, in a javascript var, etc...
  2. Is the javascript function always be your own ?

Some hints that could do the trick, should to simply parse your html and use regex to catch http links where the onclick="someFunction();" attribute is present.

Boris Guéry
Simply regexing might work. Am hoping for a more general solution though. I know there are various javascript interpreters out there (e.g Rhino), but they don't have ability to reference a DOM, which i need in this case, as some components in the _url are taken from HTML elements (productId = document.getElementById("prodid").innerHTML_url = productId + blah)
Are you meaning that your url are made by using the content ? If yes, except interpreting html/javascript/dom whith a third party (which i don't know about), i don't see anyway good way to do...However if you are able to work on the html generated page, you have to know that you should use simple <a href=""> link and then use javascript to do what you want, but just in case of accessibility.
Boris Guéry
Just think about it, but if you know how the js function works, and where does it get the information to make the url, you could parse the entire html, and still with regex, catch all the elements you need to make the url properly on the server side...Could be messy but should work !
Boris Guéry
this might be quickest in the short term... Google must do similar though, they seem to be able to parse javascript for links (although granted, Google do have a fair amount of resources at their disposal)
A: 

If you need server-side processing, you need to either:

  1. Do the processing before the content is delivered to the user, and include its output in the response, or
  2. Use something like AJAX to make a new request back to the server
Brian
+1  A: 

You could also use env.js and rhino to actually evaluate the JavaScript in the html and detect changes to the location object after manually firing a click event.

Jason Harwig
This looks interesting, will check out thanks