tags:

views:

88

answers:

2

Hi,

I am currently running a php cronjob to crawl through some HTML. I have come to a scenario where I have to be logged in to access some data. How can this be achieved?

The cronjob is running on a server which I dont have access to.

Basically, let's just say, I am trying to access some HTML data which is only available after a user logs in. I have the login details but dont know how to implement it with the cronjob.

Cheers!

+1  A: 

By "user logs in", I suppose you mean "user would log in if he was using a browser" ?

If yes, your PHP script that's crawling through HTML will need to :

  • POST data like if it was filling in the form
  • Get the answer from the server ; probably extract the session's cookies
  • Send those cookies for subsequent requests

You might be interested in using some already-existing library to facilitate that.
For instance, you can take a look at Zend_Http_Client (see also ; the part about Sending Multiple Requests With the Same Client will probably interest you ;-) )


You might also want to take a look at some other questions/answers, like :

Pascal MARTIN
A: 

You can use SimpleBrowser from SimpleTest for automating crawling. It's part of the SimpleTest framework, but it can be used on its own.

troelskn