views:

79

answers:

1

I want to get a related [things/questions] in my app, similar to what StackOverflow does, when you tab out of the Title field.

I can think of only one way to do it, which i think might be fast enough

  1. Do a search for the title in corpus of titles of all [things], and return first x matches. We can use whatever search is being used for site search.

What are other ways o do this, which are fast enough, as this is going to be sent on tabout, so a large server side processing is not feasible for it.

I am just looking for the way to do this, but I am using mysql and DJango, so if your answer uses that, all the better.

[I cannot think of good tags for it, so please feel free to edit them]

+1  A: 

You're looking at a content-based recommendation algorithm. AFAICT StackOverflow's looks at the tags and the words in the title, and finds questions that share some of these. It can be implemented as a nearest neighbour search in a space where documents are represented as TF-IDF vectors.

Implementation-wise, go with any Django search engine that supports stemming, stopwords, non-strict matches, and tf-idf weights. Algorithmic complexity isn't high (just a few index lookups), so it doesn't matter if it's written in Python.

If you don't find a search engine doing what you want, leave the stemming and stopwords to the search engine, call the search engine on individual words, and do your own tf-idf scoring with a score that favors similar tags.

Tobu
This needs to be very fast to make it work in a stackoverflow like UI. I am doubtful any recommendation system algorithm implemented in Python is going to be fast enough?
uswaretech