I'm developing a sort of personalized search engine in Ruby on Rails, and I'm currently trying to find best way of sorting results depending on user's record, in real time.
Example: items that are searched for can have tags (separate entities with ids), for example item has tags=[1, 5, 10, 23, 45].
User, on the other hand, may have flagged some tags as of special interest, so say user has tags=[5, 23].
Score used for sorting the results should take into account number of item's tags which are "stared" by the user. For example, item's score would be 50% based on item's attributes and 50% on rank depending on user's (number of tags stared).
One idea was to inject this into sorting function in information retrieval system. But in Sphinx, which I will probably use, it would be very awkward to implement (when user's vector is large). I don't know about Lucene/solr, but they don't seem to have advanced non-text search capabilities that I need anyway (distance, date, time, etc.)
Other option is to retrieve intermediary set from IR system and then process it on application level. However, I'm quite sure that processing 100-1000 records sequentially and then sorting them in Rails would be very slow.
On the other hand, it seems like task that can be easily processed in parallel - divide 1000 records into sets that processed by separate threads and then sorted.
I read about several map reduce implementations, both universal like hadoop and rails-specific like skynet etc., but they to be best suited for large batch jobs, not real time processing (unless I'm mistaken?).
Is there any in-memory, light MR implementation that I could use for this? Or maybe you have some other ideas how to handle it?
(sidenote: I believe this setup is similar to how google news works, from what I understand from "Google News personalization: scalable online collaborative filtering" paper. They match in real time a set of candidate stories with set of clusters to which user belongs to (pre-calculated earlier) to sort stories in personalized way)