tags:

views:

28

answers:

1

I am trying to write a PHP script that will get the source code of a page in my Amazon account. However, to reach that page, I must be logged in. From what I understand, I should be able to accomplish this by posting the correct request headers, and then capturing the HTML response. Is that correct? If so, I'd really appreciate it if someone could explain to me how exactly I would do this. If it's not right, I'd love to hear the correct way of doing it!

I've used Firebug to get the request and response headers I need. It's just a matter of what to do with them now. I read elsewhere on this site that you can't send a request with the PHP post method, and that perhaps using cURL is the way to go. I really know nothing about cURL, so the more info the better.

Also, feel free to point me to some useful tutorials on this topic.

Thanks!

Max

+2  A: 

You'll probably need to log in first using cURL, get the cookies with the session ID, then re-use those cookies in the following request to the actual page you need.

That's how browsers work, re-sending cookies every time. You should mimic that.

Seb
Please note that while this behavior will work, it might be against Amazon's TOS, EULA, legal mumbo-jumbo, et cetera, to scrape content off their service. This may be especially noticed if you plan on making rapid requests -- faster than a human would be able to make.Please tread lightly
Foxtrot
Also, your hosting provider might not allow you to do this (to stop spammers using their service for the reasons Foxtrot gave) and may have disabled cURL.
Kurucu
Thanks for the comments. The legality issue has definitely crossed my mind, and is something I've been planning on looking into once I figured out exactly what it is I would be doing :)
Maxwell