ansaurus

Question

How to grab data on website?

Answer 1

+3 A:

you should take a look into curl. You should be able to generate a script that retrieve some webpage easily.

Also take a look into simplexml and dom, it would help you to extract information from (X)HTML files.

Also Zend_Http could be a good alternative to curl.

Cheers

RageZ 2009-09-24 08:29:08

Answer 2

+1 A:

Well, sort of a vague question... I'd suggest the following steps:

send the login credentials via POST
grab and parse the response
do this for all relevant accounts / sites you wanna check

if you face specific problems feel free to comment on this answer

EDIT: I'd agree to RageZ in his technical approach. curl would be the 'weapon of choice' for me too... ^^

hth

K

KB22 2009-09-24 08:29:21

Answer 3

A:

First of all, check if the services where you want to log in have APIs.
It's be much easier as that's a format specifically made for the purpose of getting the datas and exploiting them in an other application.

If there is an API, you can look at it's documentation to see how to retrieve and use the datas.

If there isn't any, you need to scrap the HTML pages.
You can start by taking a look at Curl : http://php.net/curl
The idea is to simulate your own visit of the website by sending the loggin post request and getting the given datas.

After retrieving the page's datas, you can parse them with tools like dom. http://php.net/dom

Damien MATHIEU 2009-09-24 08:31:29

Answer 4

A:

Use TestPlan, it was designed as a web automation system and makes such tasks very simple.

edA-qa mort-ora-y 2010-02-19 14:22:12

Answer 5

A:

You wanna do grab data manually by coding it? You can get it done automated. You can use automation anywhere, it's a data extraction tool. I use it for the same kinda purpose as you are seeking. Some details on this web data grabber

Bob 2010-05-24 09:10:26

Answer 6

A:

I would really have a look into Snoopy if i were you, its more user friendly than curl to use in your PHP scripts. Here is some sample code.

<?php
    /*
    You need the snoopy.class.php from 
    http://snoopy.sourceforge.net/
    */

    include("snoopy.class.php");

    $snoopy = new Snoopy;

    // need an proxy?:
    //$snoopy->proxy_host = "my.proxy.host";
    //$snoopy->proxy_port = "8080";

    // set browser and referer:
    $snoopy->agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";
    $snoopy->referer = "http://www.jonasjohn.de/";

    // set some cookies:
    $snoopy->cookies["SessionID"] = '238472834723489';
    $snoopy->cookies["favoriteColor"] = "blue";

    // set an raw-header:
    $snoopy->rawheaders["Pragma"] = "no-cache";

    // set some internal variables:
    $snoopy->maxredirs = 2;
    $snoopy->offsiteok = false;
    $snoopy->expandlinks = false;

    // set username and password (optional)
    //$snoopy->user = "joe";
    //$snoopy->pass = "bloe";

    // fetch the text of the website www.google.com:
    if($snoopy->fetchtext("http://www.google.com")){ 
        // other methods: fetch, fetchform, fetchlinks, submittext and submitlinks

        // response code:
        print "response code: ".$snoopy->response_code."<br/>\n";

        // print the headers:

        print "<b>Headers:</b><br/>";
        while(list($key,$val) = each($snoopy->headers)){
            print $key.": ".$val."<br/>\n";
        }

        print "<br/>\n";

        // print the texts of the website:
        print "<pre>".htmlspecialchars($snoopy->results)."</pre>\n";

    }
    else {
        print "Snoopy: error while fetching document: ".$snoopy->error."\n";
    }
?>

Shadi Almosri 2010-05-24 09:13:38

Answer 7

A:

Use VietSpider Web Data Extractor.

VietSpider Web Data Extractor: Software crawls the data from the websites (Data Scraper), format to XML standard (Text, CDATA) then store in the relational database.Product supports the various of RDBMs such as Oracle, MySQL, SQL Server, H2, HSQL, Apache Derby, Postgres ...VietSpider Crawler supports Session (login, query by form input), multi downloading, JavaScript handling, Proxy (and multi proxy by auto scan the proxies from website),...

Download from http://binhgiang.sourceforge.net

vietspider 2010-06-01 01:51:10

ansaurus

tags:

views:

answers:

How to grab data on website?

related questions