ansaurus

Question

Converting python collaborative filtering code to use Map Reduce

Answer 1

+2 A:

This is not actually a "MapReduce" function but it should give you some significant speedup without all of the hassle.

I would actually use numpy to "vectorize" the operation and make your life easier. From this you'll just need to loop through this dictionary and apply the vectorized function comparing this item against all others.

import numpy as np
bnb_items = bnb.values()
for num in xrange(len(bnb_items)-1):
    sims = cosSim(bnb_items[num], bnb_items[num+1:]

def cosSim(User, OUsers):
""" Determinnes the cosine-similarity between 1 user and all others.
Returns an array the size of OUsers with the similarity measures

User is a single array of the items purchased by a user.
OUsers is a LIST of arrays purchased by other users.

"""

    multidot = np.vectorize(np.vdot)
    multidenom = np.vectorize(lambda x: np.sum(x)*np.sum(User))

    #apply the dot-product between this user and all others
    num = multidot(OUsers, User)

    #apply the magnitude multiplication across this user and all others
    denom = multidenom(OUsers)

    return num/denom

I haven't tested this code so there may be some silly errors but the idea should get you 90% of the way.

This should have a SIGNIFICANT speedup. If you still need a speed up there is a wonderful blog post which implements a "Slope One" recommendation system here.

Hope that helps, Will

JudoWill 2010-05-21 15:43:19

I'm not familiar with numpy and its arrays and methods. It just shot to the top of my reading list.With respect to the dictionary user_vs_purchase, when you say the values are 1 for each item purchased, does the array store 1/0 for bought/not bought for every item in my database? Are the item ids part of the array?Also, since this is keyed off of userid, will this be more useful for finding user-user similarity, or is this the way to compute item-item similarity? I'm just not following the intent of your example-would you kindly explain further?

Neil Kodner 2010-05-21 21:19:26

after reading through it again i've made a change. All you actually need to do is loop through your bnb dictionary. You'll just have to make sure you have the orders correct.

JudoWill 2010-05-22 21:06:22

ansaurus

tags:

views:

answers:

Converting python collaborative filtering code to use Map Reduce

related questions