I was going to Ask a Question earlier today when I was presented to a surprising functionality in Stackoverflow. When I wrote my question title stackoverflow suggested me several related questions and I found out that there was already two similar questions. That was stunning!
Then I started thinking how I would implement such function. How I would order questions by relatedness:
- Question that have higher number of words matchs with the new question
- If the number of matchs are the same, the order of words is considered
- Words that appears in the title has higher relevancy
That would be a simple workflow or a complex score algortithm? Some stemming to increase the recall, maybe? Is there some library the implements this function? What other aspects would you consider? Maybe Jeff could answer himself! How did you implemented this in Stackoverflow? :)