(This is no homework and no work issue. It's just my personal interest/occupation and completly fictional. But I am interested in a good algorithm or data structure.)
Let's assume, that I would run a dating site. And my special feature would be that the singles were matched by movie taste. (Why not?)
In that case I would need a way to store the movie ratings for each user. (So far no problem.) And I would need a data structure to find the best fitting user. The distance between two taste patterns would be the average distance between all ratings that both users made.
Example
movies A B C D E F G H I J K L M ...
user Xm 9 5 1 1 5
user Ym 4 6 1 8
user Zf 9 6 4 7
Distance(X,Z) = avg( abs(9-9) + abs(1-4) ) = 1.5
Distance(Y,Z) = avg( abs(4-6) + abs(6-4) + abs(8-7) ) = 1.666
So Mr. X fits slightly better to Mrs. Z, than Mr. Y does.
I like soulution that ...
- ... don't need many operations on the database
- ... don't need to handle a lot of data
- ... run fast
- ... deliver the best matching
- Ok, maybe I would consider good approximations too.
Try to keep in mind that this should also work with thousands of possible movies, users that rate only about 20-50 movies, and thousands of users.
(Because this is a mental puzzle and not a real problem, work-arrounds are not really helping.)
What would be your search algorithm or data structure?