I have an implemented of Pearson's Similarity score for comparing two dictionaries of values. More time is spent in this method than anywhere else (potentially many millions of calls), so this is clearly the critical method to optimise.
Even the slightest optimisation could have a big impact on my code, so I'm keen to explore even the smallest improvements.
Here's what I have so far:
def simple_pearson(v1,v2):
si = [val for val in v1 if val in v2]
n = len(si)
if n==0: return 0.0
sum1 = 0.0
sum2 = 0.0
sum1_sq = 0.0
sum2_sq = 0.0
p_sum = 0.0
for v in si:
val_1 = v1[v]
val_2 = v2[v]
sum1+=val_1
sum2+=val_2
sum1_sq+=pow(val_1,2)
sum2_sq+=pow(val_2,2)
p_sum+=val_1*val_2
# Calculate Pearson score
num = p_sum-(sum1*sum2/n)
temp = (sum1_sq-pow(sum1,2)/n) * (sum2_sq-pow(sum2,2)/n)
if temp < 0.0:
temp = -temp
den = sqrt(temp)
if den==0: return 1.0
r = num/den
return r