views:

196

answers:

2

Hello

I have two tabled in my Mysql database

table1 has the all webpages in my network

         | table1: (pages)|
         |----------------|
         | id   | url     |
         |----------------|

table2 has two fields, which are the source page of the link and the destination page of the link

          |---------------------------|
          |table2(links)              |
          |---------------------------|
          |from_page_id   | to_page_id|
          |----------------------------

How to calculate the page rank for my network

I have found this article here it explains the PageRank algorithm but it is very difficult to write their formula in PHP + I am not good at math

Thanks

update:

I have almost 5000 pages in my network

A: 

Why do you need exactly PageRank if that's your own network? Why not just to calculate the total number of links from unique pages to a particular page and use this number as a page rating?

FractalizeR
@FractalizeR, I agree your way of calculation would be easier but the pagerank algorithm is better so when you create a search engine for your network just use the pagerank, it worth the headache :)
ahmed
A: 

HI again

I think I have figured out how to do it but I am not sure

I will till you and you judge if my way in calculation the pagerank is correct or not

first I have added a new column to the "pages" table a called it "outgoinglinks" it has the number of out going links from that page

and I have added another two columns "pagerank" and "pagerank2"

and another column called "i" which count the the number of iterations

now lets move to the programming

     $step="pg";
     for($i=0;$i<50;$i++){
         if($step=="pg2"){
             $step="pg";
         }else{
             $step="pg2";
         }
         $totalpages=5000;
         $sql1 = "select id from pages";
         $result1 = $DB->query($sql1);
         while($row1 = $DB->fetch_array($result1)){
             $page_id = $row1["id"];
             $sql = "select * from links where to_page_id = '$page_id'";
             $result = $DB->query($sql);
             $weights_of_links=0;//sum of pageranks/number of outgoing links
             while($row = $DB->fetch_array($result)){
                   $from_page_id = $row["from_page_id"];
                   $row2 = get_record_select("pages","id = '$from_page_id'");
                   $outgoinglinks = $row2["outgoinglinks"];
                   if($step=="pg2"){
                           $from_page_id_pagerank = $row2["pagerank2"];
                   }else{
                           $from_page_id_pagerank = $row2["pagerank"];
                   }

                   $weights_of_links +=($from_page_id_pagerank/$outgoinglinks );
             }

            //final step I tried to write the formula from wikipedia and the paper I have referred to
            $pagerank = .15/$totalpages + .85*($weights_of_links);
            //update the pagerank
           $ii = $i+1;
           if($step=="pg2"){
                 update_record("pages","id='$url_id'","pagerank='$pagerank',i='$ii'");
           }else{
                 update_record("pages","id='$url_id'","pagerank2='$pagerank',i='$ii'");
           }
         }
      }

note:

before you start make sure to set the pagerank of one of the pages (any page) to 1 and leave other pages with 0

why two pageranks columns?

I did that because I think we should separate every iteration to have an accurate calculation so our script will alternate between those two columns, every iteration will do the processing for one of the page rank columns and save the new results to the other pagerank column

the previous code will loop for many times to get an accurate results like 50 times each time we will get closer to the real pageranks for our pages

my question is, if the sum of all the pageranks in my network should be equal 1! if yes how is google giving every page a rank out of 10?!

any ideas?

Thanks

ahmed
Good going, Ahmed, at a quick glance, your approach seems to reproduce the basic PageRank algorithm. A few things to note, however: 50 iterations is probably fine, I guess you found that easier than evaluating that the page rank had converged to a desired level of precision... Also, the initialization of only one page is ok, thank to the 15%/N factor, but you could possibly help the system converge faster with better guesses for more of the nodes.
mjv
The pagerank is just a relative number, there's nothing wrong with changing its scale, which is what Google probably does, otherwise the PR of even the most popular pages would be some lousey number like 0.00000000000000123. You might want to do the same, say multipiying al PR by 1000 or 1000, for readability.
mjv
@mjv, thanks, another problem rising now, some pages in my network have links to pages outside the network, would this affects the accuracy of the calculation!?
ahmed