views:

677

answers:

2

Given remote page:

http://example.com/paged_list.aspx

which uses a Javascript function call to display several pages of tabular data:

javascript: show_page(1) javascript: show_page(2)

and so on. Users click on the page links to display each page, which triggers a reload but with no query string, ie the URI remains the same.

In scraping this site, it would be useful to have a way to obtain subsequent pages but there is no obvious way to specify a page number in the request (passed to file_get_contents()).

Is there any way to:

  1. Open a remote web address.
  2. Call a known javascript function at that address.
  3. Return the results?
+2  A: 

Emulating JS in PHP would be the tough route. Much easier to analyze the JS source and determine the URL target of the background AJAX operation. Should then be a fairly easy task to pull the entire data set into your PHP script by calling the URL and modifying args as needed.

nathan
It actually turned out to be an ASP.NET feature I was unfamiliar with, __doPostback. Once I read up on how that worked, I was able to mimic its behaviour using libCurl to submit POST data. Thanks all!
EloquentGeek
Can you post your solution?
Paul Schreiber
A: 

Your best bet would be to try to reverse engineer the javascript function/ajax calls to their php script in order to do your own request to their server with the correct arguments.

You can use many firefox addons to make your life easier at reversing all of that (e.g. firebug by looking at network activity)

Chetane