views:

87

answers:

1

Hello guys

does anyone know if it is possible to display google page rank of a particular website using php script?

if it is possible, how do i do it?

+1  A: 

Okay, i re-wrote my Answer and extracted only the relevant part of my SEO Helper (my previous version had other stuff like Alexa Rank, Google Index, Yahoo Links etc in it. If you are looking for that, just see check an older revision of this answer!)

Please be aware that there are pages that have NO PAGERANK and by no I DON'T MEAN ZERO. There is just none. This may be because the page is so very unimportant (even less im portant than PR 0) or just so new but might very well be important. This is consiedered the same as PR 0 in my class!

This has some pros and some cons. If possible you should handle it seperately in your logic, but this is not always possible, so 0 is the next best approach.

Furthermore:

This code is reverse engeneered and does not utilize some sort of API that has any form of SLA or whatever. So it might stop working ANY TIME!

And PLEASE DONT FLOOD GOOGLE!

I made the test. If you have only a very short period of sleep, google blocks you after 1000 requests (for quite some time!). With a random sleep between 1.5 and 2 secs it looks fine.

I once crawled the pagerank for 70k pages. Only once, because I just needed it. I did only 5k a day from several IPs and now i have the data and It doesnt get outdated because the pages are there for decades.

IMO its totally OK to check a pagerank once in a while or even some at once, but dont miss-use this code or google may lock us out all together!

<?php
/*
 * @author Joe Hopfgartner <[email protected]>
 */
class Helper_Seo
{

    protected function _pageRankStrToNum($Str,$Check,$Magic) {
        $Int32Unit=4294967296;
        // 2^32
        $length=strlen($Str);
        for($i=0;$i<$length;$i++) {
            $Check*=$Magic;
            //If the float is beyond the boundaries of integer (usually +/- 2.15e+9 = 2^31),
            // the result of converting to integer is undefined
            if($Check>=$Int32Unit) {
                $Check=($Check-$Int32Unit*(int)($Check/$Int32Unit));
                //if the check less than -2^31
                $Check=($Check<-2147483648)?($Check+$Int32Unit):$Check;
            }
            $Check+=ord($Str {
                $i
            });
        }
        return $Check;
    }
    /* 
    * Genearate a hash for a url
    */
    protected function _pageRankHashURL($String) {
        $Check1=self::_pageRankStrToNum($String,0x1505,0x21);
        $Check2=self::_pageRankStrToNum($String,0,0x1003F);
        $Check1>>=2;
        $Check1=(($Check1>>4)&0x3FFFFC0)|($Check1&0x3F);
        $Check1=(($Check1>>4)&0x3FFC00)|($Check1&0x3FF);
        $Check1=(($Check1>>4)&0x3C000)|($Check1&0x3FFF);
        $T1=(((($Check1&0x3C0)<<4)|($Check1&0x3C))<<2)|($Check2&0xF0F);
        $T2=(((($Check1&0xFFFFC000)<<4)|($Check1&0x3C00))<<0xA)|($Check2&0xF0F0000);
        return($T1|$T2);
    }
    /* 
    * genearate a checksum for the hash string
    */
    protected function CheckHash($Hashnum) {
        $CheckByte=0;
        $Flag=0;
        $HashStr=sprintf('%u',$Hashnum);
        $length=strlen($HashStr);
        for($i=$length-1;$i>=0;$i--) {
            $Re=$HashStr {
                $i
            };
            if(1===($Flag%2)) {
                $Re+=$Re;
                $Re=(int)($Re/10)+($Re%10);
            }
            $CheckByte+=$Re;
            $Flag++;
        }
        $CheckByte%=10;
        if(0!==$CheckByte) {
            $CheckByte=10-$CheckByte;
            if(1===($Flag%2)) {
                if(1===($CheckByte%2)) {
                    $CheckByte+=9;
                }
                $CheckByte>>=1;
            }
        }
        return '7'.$CheckByte.$HashStr;
    }
    public static function getPageRank($url) {
        $fp=fsockopen("toolbarqueries.google.com",80,$errno,$errstr,30);
        if(!$fp) {
            trigger_error("$errstr ($errno)<br />\n");
            return false;
        }
        else {
            $out="GET /search?client=navclient-auto&ch=".self::CheckHash(self::_pageRankHashURL($url))."&features=Rank&q=info:".$url."&num=100&filter=0 HTTP/1.1\r\n";
            $out.="Host: toolbarqueries.google.com\r\n";
            $out.="User-Agent: Mozilla/4.0 (compatible; GoogleToolbar 2.0.114-big; Windows XP 5.1)\r\n";
            $out.="Connection: Close\r\n\r\n";
            fwrite($fp,$out);
            #echo " U: http://toolbarqueries.google.com/search?client=navclient-auto&amp;ch=".$this-&gt;CheckHash($this-&gt;_pageRankHashURL($url))."&amp;features=Rank&amp;q=info:".$url."&amp;num=100&amp;filter=0";
            #echo "\n";
            //$pagerank = substr(fgets($fp, 128), 4);
            //echo $pagerank;
            #echo "DATA:\n\n";
            $responseOK = false;
            $response = "";
            $inhead = true;
            $body = "";
            while(!feof($fp)) {

                $data=fgets($fp,128);

                if($data == "\r\n" && $inhead) {
                    $inhead = false;
                } else {
                    if(!$inhead) {
                        $body.= $data;
                    }
                }

                //if($data == '\r\n\r\n')
                $response .= $data;
                if(trim($data) == 'HTTP/1.1 200 OK') {
                    $responseOK = true;
                } 

                #echo "D ".$data;
                $pos=strpos($data,"Rank_");
                if($pos===false) {
                }
                else {
                    $pagerank=trim(substr($data,$pos+9));
                    if($pagerank === '0') {
                            fclose($fp);
                            return 0;
                    } else if(intval($pagerank) === 0) {
                        throw new Exception('couldnt get pagerank from string: '.$pagerank);
                        //trigger_error('couldnt get pagerank from string: '.$pagerank);
                        fclose($fp);
                        return false;
                    } else {
                        fclose($fp);
                        return intval( $pagerank );
                    }
                }
            }
            fclose($fp);


            //var_dump($body);
            if($responseOK && $body=='') {
                return 0;
            }
            //return 0;
            throw new Exception('couldnt get pagerank, unknown error. probably google flood block. my tests showed that 1req/sec is okay! i recommend a random sleep between 1.5 and 2 secs. no sleep breaks at ~1000 reqs.');
            //trigger_error('couldnt get pagerank, unknown error. probably google flood block.');
            return false;
        }
    }

}
$url = "http://www.2xfun.de/";
$pagerank = Helper_Seo::getPagerank($url);
var_dump($pagerank); 
?>
Joe Hopfgartner
+1 for code body too long for me to wantt o read through to verify its actually correct
Chris
lol :D i think it doesnt go any shorter. not with the proper error handling (ignoring the 3 lil functions at the end of the class that are just there and dont need to be read.. i left them there as well because they might be of use as well to sb who wants to know the pr)
Joe Hopfgartner
Hello, is there any way you can let me know about the api. and if google blocks it i hope they don't block the website the call is calling from frim using any google services, cause i use alot of google services. is this a violation sort of?
Smith
there IS NO API. the provided code is reverse engeneered from the google toolbar. AFAIk the ip is blocked to access this special service after ~1000 requests without delay. 1 sec delay seems to work fine. i would recommend a dynamic delay of 2-3 seconds. I don't think it will affect your sites google ranking or something, however its impossible to say that and I wouldnt recommend doing it from your webservers ip. I however would strongly recommend not to do any form of batch crawling from the ip that serves your website.
Joe Hopfgartner