views:

198

answers:

2

Hello there...I was wondering if you can make the opization of my code. Because, when I aplied in localhost the running is about "17 MINUTES" ( calculation with 100000 query)

For data is like this : $data[UserID][ItemID] = Rating ==> $data[1][1] = 5;

This is my code :

    <?php
        include ".......";

        set_time_limit(0);
        //FOR PEARSON SIMILARITY
        function pearson($u1,$u2){
           global $data;
           global $average;
           $total=0;
           $num=0;

           foreach($data[$u1] as $item=>$rating){
              $total+=$rating;
              $num++;
           }        

           $avg1=$total/$num;
           $avg1=$average[$u1];

           $total=0;
           $num=0;
           foreach($data[$u2] as $item=>$rating){
              $total+=$rating;
              $num++;
           }        

           $avg2=$total/$num;
           $rata2=$average[$u2];

           $nom=0;
           foreach($data[$u1] as $item=>$rating){
              if($data[$u2][$item]){
                $nom +=($rating-$avg1)*($data[$u2][$item]-$avg2); 
              }
           }


           foreach($data[$u1] as $item=>$rating){
              if($data[$u2][$item]){
                $bag1 += pow(($rating-$avg1),2);  
                $bag2 += pow(($data[$u2][$item]-$avg2),2); 
              }
            $den=$bag1*$bag2;  
           }

          $denom=sqrt($den);

          if($denom == 0) {
                        return 0;
              }

          return $nom/$denom;

        }


//FOR RECOMMENDATION
$sim_tres = 0.5;

function av_rating($user){
    global $data;
    $total=0;
    $num=0;
    foreach($data[$user] as $item=>$rating){
      $total+=$rating;
      $num++;
    }       
    $avg=$total/$num;
    return $avg;

    global $aver_age;
    return $aver_age[$user];
}

function p_rate($user,$item){

    global $sim_tres;
    global $data;
    $prc=array();
    $ru=av_rating($user);
    $aa=0;
    $bb=0;


    foreach($data as $u=>$rating){
        if((!$prc[$user][$u])&&(!$prc[$u][$user])){

        $pr=pearson($user,$u);
        $prc[$user][$u]=$pr;
        $prc[$u][$user]=$pr;
        }else{

            $pr=$prc[$user][$u];

        }
        if(($pr>=$sim_tres)&&($u!=$user)){
            //echo $data[$u][$item];
            //echo "<br>";
            if($data[$u][$item]){
                $rt=$data[$u][$item];
            }else{
                $rt=av_rating($u);
            }
            $aa+=($rt-av_rating($u))*$pr;
            $bb+=abs($pr);      
        }
    }
    return $ru+$aa/$bb;
}

function p_rate_all($user){
    global $data;
    global $total_item;
    $rec_ar=array();
    for($i=1;$i<=$total_item;$i++){
        //print $data[$user][$i];
        //echo "<br>";
        if(!$data[$user][$i]){
            //echo "wew";
            $rec_ar[$i]=p_rate($user,$i);
            echo("<tr><td>".$i."</td><td>".$rec_ar[$i]."</td></tr>");

        }       
    }
}
?>

Can you help me, to optimize it? Every Help I appreciate it..Sorry if my English bad.

+2  A: 

As Pekka suggested, it's a good idea to figure out what parts of your script are taking a long time. I usually add timer code to my automated processes so I can watch them and figure out how long various parts are taking. Here's an example using microtime:

$start_time = microtime()

function mlog($message) {
    GLOBAL $start_time;
    $run_time = microtime() - $start_time;
    echo "$message - $run_time \r\n";
    return $run_time;
}

Then you just drop a call into your code whenever you want to see the duration the script has been running:

function pearson($u1,$u2){
    mlog('Starting Pearson Function Call');
    ...

    mlog('Starting first loop');
    foreach($data[$u1] as $item=>$rating){}
    mlog('Finished first loop');
    ...
    # etc.
    mlog('Finished Pearson Function Call');

}

If you want to calculate block times, you can use the return value from mlog(). This would help you determine which blocks are taking a particularly long time (so you know where you should spend your time optimizing). E.g.:

$start = mlog();
# Do something expensive
$end = mlog();
$blocktime = $end - $start;

Other Thoughts

An easy optimization is to restrict the places where you do any input or output. If your computations stop to read from disk or a database (especially in a loop), they'll take a very long time. You should get all of your data into memory before running computations.

As your dataset grows, it will take longer to process. There's really no way around that, but you may be well-served by off-loading heavy computations to code written in C/++.

banzaimonkey
Great tips. However, `log()` is predefined as part of PHP's math functions since PHP4.
Mike B
Thank you for your answer banzaimonkey. When I running this code. I maybe have the same method like you.but I put it when start the process and at the end at process. :D
@Mike B Good catch, Mike. Updated to a non-reserved word.
banzaimonkey
+5  A: 

Try to avoid looping then relooping through the same arrays again and again:

       foreach($data[$u1] as $item=>$rating){ 
          if($data[$u2][$item]){ 
            $nom +=($rating-$avg1)*($data[$u2][$item]-$avg2);  
          } 
       } 


       foreach($data[$u1] as $item=>$rating){ 
          if($data[$u2][$item]){ 
            $bag1 += pow(($rating-$avg1),2);   
            $bag2 += pow(($data[$u2][$item]-$avg2),2);  
          } 
        $den=$bag1*$bag2;   
       } 

===>

       foreach($data[$u1] as $item=>$rating){ 
          if($data[$u2][$item]){ 
            $nom +=($rating-$avg1)*($data[$u2][$item]-$avg2);  
            $bag1 += pow(($rating-$avg1),2);   
            $bag2 += pow(($data[$u2][$item]-$avg2),2);  
          } 
        $den=$bag1*$bag2;   
       } 

EDIT

What is the point of:

       foreach($data[$u1] as $item=>$rating){ 
          $total+=$rating; 
          $num++; 
       }         

       $avg1=$total/$num; 
       $avg1=$average[$u1]; 

looping through, calculating $total and $num and using them to set $avg1, then promptly resetting $avg1... especially as the next two lines of code reset $total and $num to 0. It means that all the time spent in the loop is totally wasted.

Mark Baker
hmm...thank you Mark. It makes the computation 2 time faster than before. :D