pearson

Pearson Similarity Score, how can I optimise this further?

I have an implemented of Pearson's Similarity score for comparing two dictionaries of values. More time is spent in this method than anywhere else (potentially many millions of calls), so this is clearly the critical method to optimise. Even the slightest optimisation could have a big impact on my code, so I'm keen to explore even the s...

Determining Perfect Hash Lookup Table for Pearson Hash

I'm developing a programming language, and in my programming language, I'm storing objects as hash tables. The hash function I'm using is Pearson Hashing, which depends on a 256-bit lookup table. Here's the function: char* pearson(char* name, char* lookup) { char index = '\0'; while(*name) { index = lookup[index ^ ...

What is wrong with this python function from "Programming Collective Intelligence"?

This is the function in question. It calculates the Pearson correlation coefficient for p1 and p2, which is supposed to be a number between -1 and 1. When I use this with real user data, it sometimes returns a number greater than 1, like in this example: def sim_pearson(prefs,p1,p2): si={} for item in prefs[p1]: if ite...

What is wrong with the pearson algoritm from “Programming Collective Intelligence”?

This function is from the book "Programming Collective Intelligence”, and is supposed to calculate the Pearson correlation coefficient for p1 and p2, which is supposed to be a number between -1 and 1. If two critics rate items very similarly the function should return 1, or close to 1. With real user data I sometimes get weird results....

Collaborative Filtering Program: What to do for a Pearson Score When There Isn't Enough Data

I'm building a recommendation engine using collaborative filtering. For similarity scores, I use a Pearson correlation. This is great most of the time, but sometimes I have users that only share a 1 or 2 fields. For example: User 1{ a: 4 b: 2 } User 2{ a: 4 b: 3 } Since this is only 2 data points, a Pearson correlation would always...

Please Optimize My Pearson Code

Hello there...I was wondering if you can make the opization of my code. Because, when I aplied in localhost the running is about "17 MINUTES" ( calculation with 100000 query) For data is like this : $data[UserID][ItemID] = Rating ==> $data[1][1] = 5; This is my code : <?php include "......."; set_time_limit(0); ...