views:

406

answers:

1

Problem statement:

We have equal number of men and women. Each man has a preference score toward each woman. So do the woman for each man. Each of the men and women have certain interests. Based on the interest, we calculate the preference scores.

So initially, we have an input in a file having x columns. The first column is the person (man/woman) id. Ids are nothing but numbers from 0 ... n. (First half are men and next half women). The remaining x-1 columns will have the interests. These are integers too.

Now, using this n by x-1 matrix, we have come up with an n by n/2 matrix. The new matrix has all men and woman as their rows and scores for opposite sex in columns.

We have to sort the scores in descending order, also we need to know the id of person related to the scores after sorting.

So, here I wanted to use hash table.

Once we get the scores we need to make up pairs, for which we need to follow some rules.

My trouble is with the second matrix of n by n/2 that needs to give information of which man/woman has how much preference on a woman/man. I need these scores sorted so that I know who is the first preferred woman/man, 2nd preferred and so on for a man/woman.

I hope to get good suggestions on the data structures I use. I prefer PHP or Perl.

NB:

This is not homework. This is a little modified version of stable marriage algorithm. I have a working solution. I am only working on optimizing my code.

It is very similar to stable marriage problem but here we need to calculate the scores based on the interests they share. So, I have implemented it as the way you see in the wiki page http://en.wikipedia.org/wiki/Stable_marriage_problem.

My problem is not solving the problem. I solved it and can run it. I am just trying to have a better solution. So I am asking suggestions on the type of data structure to use.

Conceptually I tried using an array of hashes. where the array index give the person id and the hash in it gives the ids <=> scores in sorted manner. I initially start with an array of hashes. Now, I sort the hashes on values, but I could not store the sorted hashes back in an array. So just stored the keys after sorting and used these to get the values from my initial unsorted hashes.

Can we store the hashes after sorting? Can you suggest a better structure?

+1  A: 
Sinan Ünür
+1 Nice implementation, though I'm tempted to start thinking of micro-optimization :)
DVK
@DVK Thanks. I am not sure where the optimizations would start paying off. Without knowing the dimensions of the problem and requirements, I would be tempted to leave it as is although one could really go town with this particular algorithm. I wish I understood SQL well enough to turn this into a bunch of `SELECT` s and `JOIN` s.
Sinan Ünür
@Sinan - OK, now I got the bug... I'll see if I can come up with some sort of prototype in SQL... barring that, I'll ask you to post this as SO question ;)I just realized I know what G-S algo is - or rather, I knew it when I was 17, except for the name)... DANG I forgot most of my theoretical base I once had (had to free up the brain for calculations of "will the Managing Director of a trading systems group throw a fit if my program quality checks the data their process produces?" :)I don't think I can do it in pure set-based SQL, will need at least a WHILE loop :(
DVK
@DVK I will think about how to formulate it — unless you haven't already posted something. This stuff is fun, isn't it?
Sinan Ünür
@Sinan - What, marriage proposals??? I'd classify it as decidedly NOT fun :)
DVK
@DVK, ;-) I meant game theory and matching algorithms.
Sinan Ünür
@Sinan for some quick fun, google "Gale-Shapley" and "online dating"... Someone did some really cute study
DVK
You mean http://www.aeaweb.org/articles.php?doi=10.1257/aer.100.1.130 I had seen the working paper but I did not know it had been accepted by the AER. Great. One of my papers just got rejected by them :-(
Sinan Ünür
I think so - I read a blog post about the article so don't recall if it was AER or not, but the paper sounds like the same one.
DVK