I'm trying to implement the calculation of correlation coefficient of people between two sets of data in php. I'm just trying to do the porting python script that can be found at this url http://answers.oreilly.com/topic/1066-how-to-find-similar-users-with-python/
my implementation is the following:
class LB_Similarity_PearsonCorrelation implements LB_Similarity_Interface{
public function similarity($user1, $user2){
$sharedItem = array();
$pref1 = array();
$pref2 = array();
$result1 = $user1->fetchAllPreferences();
$result2 = $user2->fetchAllPreferences();
foreach($result1 as $pref){
$pref1[$pref->item_id] = $pref->rate;
}
foreach($result2 as $pref){
$pref2[$pref->item_id] = $pref->rate;
}
foreach ($pref1 as $item => $preferenza){
if(key_exists($item,$pref2)){
$sharedItem[$item] = 1;
}
}
$n = count($sharedItem);
if ($n == 0) return 0;
$sum1 = 0;$sum2 = 0;$sumSq1 = 0;$sumSq2 = 0;$pSum = 0;
foreach ($sharedItem as $item_id => $pre) {
$sum1 += $pref1[$item_id];
$sum2 += $pref2[$item_id];
$sumSq1 += pow($pref1[$item_id],2);
$sumSq2 += pow($pref2[$item_id],2);
$pSum += $pref1[$item_id] * $pref2[$item_id];
}
$num = $pSum - (($sum1 * $sum2) / $n);
$den = sqrt(($sumSq1 - pow($sum1,2)/$n) * ($sumSq2 - pow($sum2,2)/$n));
if ($den == 0) return 0;
return $num/$den;
}
}
clarification to better understand the code, the method fetchAllPreferences return back a set of objects that are actually the items, turns them into an array for ease of management
I'm not sure that this implementation is correct, in particular I have some doubts about the correctness of the calculation of the denominator.
any advice is welcome.
thanks in advance!