views:

651

answers:

7

So, often, I check my accounts for different numbers. For example, my affiliate accounts- i check for cash increase. I want to program a script where it can login to all these websiets and then grab the money value for me and display it on one page. How can I program this?

+3  A: 

you should take a look into curl. You should be able to generate a script that retrieve some webpage easily.

Also take a look into simplexml and dom, it would help you to extract information from (X)HTML files.

Also Zend_Http could be a good alternative to curl.

Cheers

RageZ
+1  A: 

Well, sort of a vague question... I'd suggest the following steps:

  • send the login credentials via POST
  • grab and parse the response

  • do this for all relevant accounts / sites you wanna check

if you face specific problems feel free to comment on this answer

EDIT: I'd agree to RageZ in his technical approach. curl would be the 'weapon of choice' for me too... ^^

hth

K

KB22
A: 

First of all, check if the services where you want to log in have APIs.
It's be much easier as that's a format specifically made for the purpose of getting the datas and exploiting them in an other application.

If there is an API, you can look at it's documentation to see how to retrieve and use the datas.

If there isn't any, you need to scrap the HTML pages.
You can start by taking a look at Curl : http://php.net/curl
The idea is to simulate your own visit of the website by sending the loggin post request and getting the given datas.

After retrieving the page's datas, you can parse them with tools like dom. http://php.net/dom

Damien MATHIEU
A: 

Use TestPlan, it was designed as a web automation system and makes such tasks very simple.

edA-qa mort-ora-y
A: 

You wanna do grab data manually by coding it? You can get it done automated. You can use automation anywhere, it's a data extraction tool. I use it for the same kinda purpose as you are seeking. Some details on this web data grabber

Bob
A: 

I would really have a look into Snoopy if i were you, its more user friendly than curl to use in your PHP scripts. Here is some sample code.

<?php
    /*
    You need the snoopy.class.php from 
    http://snoopy.sourceforge.net/
    */

    include("snoopy.class.php");

    $snoopy = new Snoopy;

    // need an proxy?:
    //$snoopy->proxy_host = "my.proxy.host";
    //$snoopy->proxy_port = "8080";

    // set browser and referer:
    $snoopy->agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";
    $snoopy->referer = "http://www.jonasjohn.de/";

    // set some cookies:
    $snoopy->cookies["SessionID"] = '238472834723489';
    $snoopy->cookies["favoriteColor"] = "blue";

    // set an raw-header:
    $snoopy->rawheaders["Pragma"] = "no-cache";

    // set some internal variables:
    $snoopy->maxredirs = 2;
    $snoopy->offsiteok = false;
    $snoopy->expandlinks = false;

    // set username and password (optional)
    //$snoopy->user = "joe";
    //$snoopy->pass = "bloe";

    // fetch the text of the website www.google.com:
    if($snoopy->fetchtext("http://www.google.com")){ 
        // other methods: fetch, fetchform, fetchlinks, submittext and submitlinks

        // response code:
        print "response code: ".$snoopy->response_code."<br/>\n";

        // print the headers:

        print "<b>Headers:</b><br/>";
        while(list($key,$val) = each($snoopy->headers)){
            print $key.": ".$val."<br/>\n";
        }

        print "<br/>\n";

        // print the texts of the website:
        print "<pre>".htmlspecialchars($snoopy->results)."</pre>\n";

    }
    else {
        print "Snoopy: error while fetching document: ".$snoopy->error."\n";
    }
?>
Shadi Almosri
A: 

Use VietSpider Web Data Extractor.

VietSpider Web Data Extractor: Software crawls the data from the websites (Data Scraper), format to XML standard (Text, CDATA) then store in the relational database.Product supports the various of RDBMs such as Oracle, MySQL, SQL Server, H2, HSQL, Apache Derby, Postgres ...VietSpider Crawler supports Session (login, query by form input), multi downloading, JavaScript handling, Proxy (and multi proxy by auto scan the proxies from website),...

Download from http://binhgiang.sourceforge.net

vietspider