tags:

views:

235

answers:

4

For example, how does StackOverflow decide other questions are similar?

When I typed in the question above and then tabbed to this memo control I saw a list of existing questions which might be the same as the one I am asking.

What technique is used to find similar questions?

+1  A: 

I think it's plain old word matching. However, I might add that this feature does not work as well as I would like it to. It's much better to do google search with site:stackoverflow.com prefix than to rely on SO to provide the relevant suggestions.

Learning
A: 

The matching program would store an index of all questions. When you ask a question, all keywords in your question are matched against the index. This is similar to Google Search. Lucene open source search can be (and with high probability has been) used for this. Since the results are not quite accurate, I presume they index just the headlines of the questions, as an approximation.

The other related keyword is collaborative filtering, the algorithm popularized by Amazon to recommend products based on behavior of other similar customers. In the current case, an alternative algorithm based on collaborative filtering is: keywords are extracted from the question, then tags associated (in the history) with the keywords are found. Questions which have those tags are returned. Well, experiments are needed to see whether it works well at all.

Amit Kumar
+3  A: 

I got an email from [email protected] on Mar 20 that mentions how it works:

the "ask a question" search is exclusively on title and will not match anything in the body. It is a mystery to me why people think it's better.

The last sentence refers to the search bar, which I've found is less useful when I'm trying to find a specific question I've already seen.

RossFabricant
A: 

Poorly -- using MS SQL Full Text Search, I believe. You'll have better luck using Lucene, IMO. For more background on the topic see the Wikipedia article on Lucene or the general topic of information retrieval.

tvanfosson