tags:

views:

46

answers:

3

Hi ,

Is it possible to get/scrap data from https links using php,

the https page ask for a user name and password and has data in XML format. so is it possible to get this data using PHP ?

can anyone me suggest me the procedure ?

A: 

First of all you need your webserver and php to act like a https client, for that you need mod_ssl for apache and open_ssl for php. And service you trying to access must have a defined way to send username and password. That way you can log-in and access the xml.

Imran Naqvi
Why would you scrape your own site? Scraping usually implies that you do not have direct access to the data. I think OP wants to scrape an external website.
Russell Dias
@Russell Dias, its not even scrapping, he is trying to read a xml file over secure connection using some username/password. I don't think there is something wrong even if he is trying to read from external source.
Imran Naqvi
Well, if he is trying to read from an external source - as you suggested. How is informing OP of `mod_ssl` relevant?
Russell Dias
A: 

Yes it is possible, just use cURL extension to get page content and then parse it as you may see fit (using XML functions, regex, etc). cURL cand handle SSL, authentication, cookies, etc. For more details see:

http://php.net/manual/en/book.curl.php

hpuiu
@hpuiu, if your webserver and php cannot act as secure client, curl can not do anything alone.
Imran Naqvi
@Imran Naqvi, I am afraid I don't quite understand your point.
hpuiu
cURL can very well act on its own. You do not need your server to act as a secure client to be able to access a secure client, cURL can do so via the `curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);` method.
Russell Dias
A: 

You can use http://simplehtmldom.sourceforge.net/ class, get html from a url and get data from the DOM with jQuery sintax. Very usefull to me.

arnaldex