views:

46

answers:

2

Hi, I have been creating a web scraper for an internal application with PHP but one of the pages has a JavaScript login is there any way of autonomously logging in to scrape the data as usual?

(I am using curl to log in to the other two sites)

+2  A: 

Use Firebug to check out what does browser send to server. After it you can do the same requests with curl.

Riateche
Thanks buddy, I have only been doing PHP a day so its all a bit new :)
Ross Alexander
A: 

There are many ways to implement a JavaScript login interface. Your question does not provide enough information to answer definitively.

Most JavaScript login interfaces are just logging in over AJAX. So it's just an asynchronous POST request that contains the login info. That can be faked using the proper headers. Install a browser plugin that lets you monitor HTTPS requests and you'll be able to see what headers and other form data to send.

Lèse majesté
The other guy seemed to answer it easily enough, but thanks anyway
Ross Alexander
That will not always work. Some login scripts use a security token specifically so that just repeating the request will not work. There could also be other interactions designed to prevent (or at least make more difficult) webscraping.
Lèse majesté