views:

79

answers:

2

Hey guys, I have a sort of speed dating type application (not used for dating, just a similar concept) that compares users and matches them in a round based event.

Currently I am storing each user to user comparison (using cosine similarity) and then finding a round in which both users are available. My current set up works fine for smaller scale but I seem to be missing a few matchings in larger data sets.

For example with a setup like so (assuming 6 users, 3 from each group)

Round (User1, User2)
----------------------------
1  (x1,y1)  (x2,y2)  (x3,y3)
2  (x1,y2)  (x2,y3)  (x3,y1)
3  (x1,y3)  (x2,y1)  (x3,y2)

My approach works well right now to ensure I have each user meeting the appropriate user without having overlaps so a user is left out, just not with larger data sets.

My current algorithm

I store a comparison of each user from x to each user from y like so

Round, user1, user2, similarity

And to build the event schedule I simply sort the comparisons by similarity and then iterate over the results, finding an open round for both users, like so:

event.user_maps.all(:order => 'similarity desc').each do |map|
  (1..event.rounds).each do |round|
    if user_free_in_round?(map.user1) and user_free_in_round?(map.user2)
      #creates the pairing and breaks from the loop
    end
  end
end

This isn't exact code but the general algorithm to build the schedule. Does anyone know a better way of filling in a matrix of item pairings where no one item can be in more than one place in the same slot?

EDIT

For some clarification, the issue I am having is that in larger sets my algorithm of placing highest similarity matches first can sometimes result in collisions. What I mean by that is that the users are paired in such a way that they have no other user to meet with.

Like so:

Round (User1, User2)
----------------------------
1  (x1,y1)  (x2,y2)  (x3,y3)
2  (x1,y3)  (x2,nil)  (x3,y1)
3  (x1,y2)  (x2,y1)  (x3,y2)

I want to be able to prevent this from happening while preserving the need for higher similar users given higher priority in scheduling.

In real scenarios there are far more matches than there are available rounds and an uneven number of x users to y users and in my test cases instead of getting every round full I will only have about 90% or so of them filled while collisions like the above are causing problems.

+1  A: 

I think the question still needs clarification even after edit, but I could be missing something.

As far as I can tell, what you want is that each new round should start with the best possible matching (defined as sum of the cosine similarities of all the matched pairs). After any pair (x_i,y_j) have been matched in a round, they are not eligible for the next round.

You could do this by building a bipartite graph where your Xs are nodes in one side and Ys are nodes in another side, and the edge weight is cosine similarity. Then you find the max weighted match in this graph. For the next rounds, you eliminate the edges that have already been used in previous round and run the matching algorithm again. For details on how to code max weight matching in bipartite graph, see here.

BTW, this solution is not optimum since we are proceeding from one round to next in a greedy fashion. I have a feeling that getting the optimum solution would be NP hard, but I don't have a proof so can't be sure.

Amit Prakash
See my edits for a more detailed end goal
Jimmy
A: 

I agree that the question still needs clarification. As Amit expressed, I have a gut feeling that this is an NP hard problem, so I am assuming that you are looking for an approximate solution.

That said, I would need more information on the tradeoffs you would be willing to make (and perhaps I'm just missing something in your question). What are the explicit goals of the algorithm?

Is there a lower threshold for similarity below which you don't want a pairing to happen? I'm still a bit confused as to why there would be individuals which could not be paired up at all during a given round...

Essentially, you are performing a search over the space of possible pairings, correct? Maybe you could use backtracking or some form of constraint-based algorithm to make sure that you can obtain a complete solution for a given round...?

geeketteSpeaks
Currently I have no cutoff point for pairing, the goal of the algorithm is to create a matrix of user, user pairings in such a way that we don't run into the issue of the second matrix i have in my question.
Jimmy