I would like information on algorithms that can help identify commonality and differences between sets of overlapping data.
Using stackoverflow's tag system as an example:
Let's say this question has been given 5 tags. Let's say there are 1000 other questions that have at least one of these tags. Of these 1000 questions, how many of these questions have tags in common that my original post does not have?
Another more simple way of describing this is an auto-suggest tagging system :
"You tagged your question with [5 tags I selected]. Other similiar questions were tagged with [list of tags that might be of interest]. where [list of tags that might be of interest] are frequently occuring tags that aren't in my orginal list.
Code examples in c# if possible :)