views:

421

answers:

2

I'm thinking of an application that would try to prove the "Six degrees of separation" theory with a set of users that are part of a social network.

I would have those elements:

  1. A couple of users for which I'd like to prove the six degrees theory
  2. For each user, I know the list of friends in the social network

Which is the best algorithm to see if the two users are connected, with which degree and show the eventual steps in the connection?

+2  A: 

Some additional background material:

To solve this problem generally, you'd want to avoid web scraping and other ad-hoc techniques that are specific to one social network. Instead, you will probably want to look into XHTML Friends Network (XFN) which is a way to use the rel="" attribute of a hyperlink to indicate the relationship between the target of that hyperlink and you. There is also a competing standard called FOAF which uses RDF.

These microformats have been around for a while, but support for them has grown a great deal just recently. StackOverflow uses "me" in the link on your profile page. WordPress blogs provide an easy way in the editing interface for the blogroll to add these tags. Many social sites use these in links between friends to indicate relationships.

Because of this, Google has gotten interested in this, and is starting to mine this data. They have a Social Graph API that can mine both XFN and FOAF data to do exactly some of the things you want to do. I suggest you start there. The nice thing about Google's API is since they are mining this all over the web, you can widen out your search beyond the specific social network you had in mind.

Tim Farley
+3  A: 

Finding the degree of separation between two people in a social network is just a special case of finding the shortest path between two points in a graph. The most common approach is Dijkstra's algorithm, but see also a longer discussion of the Shortest path problem.

In addition, by running an All-pairs shortest path algorithm, you could find out the minimum, maximum, and average number of degrees of separation for the entire network.

dmazzoni