How can I prove the "Six Degrees of Separation" concept programmatically?

views:

750

answers:

+7 Q:

How can I prove the "Six Degrees of Separation" concept programmatically?

I have a database of 20 million users and connections between those people. How can I prove the concept of "Six degrees of separation" concept in the most efficient way in programming?

link to the article about Six degrees of separation

+10 A:

You just want to measure the diameter of the graph. This is exactly the metric to find out the seperation between the most-distantly-connected nodes in a graph.

Lots of algorithms on Google, Boost graph too.

SPWorley 2009-06-12 20:41:38

Is six degress a maximum or an average? Most of the actual analysis I've read about uses the average not the maximum.

Kathy Van Stone 2009-06-12 21:06:25

The common conception of "six degrees of separation" is that it is a maximum. Of course, that's not true at all in reality. It's just more impressive to say it that way and hard to find counterexamples.

Tyler McHenry 2009-06-12 21:13:19

+3 A:

You can probably fit the graph in memory (in the representation that each vertex knows a list of its neighbors).

Then, from each vertex n, you can run a breadth-first search (using a queue) to the depth of 6 and count number of vertices visited. If not all vertices are visited, you have disproved the theorem. In other case, continue with next vertex n.

This is O(N*(N + #edges)) = N*(N + N*100) = 100N^2, if user has 100 connections on average, Which is not ideal for N=20 million. I wonder if the mentioned libraries can compute the diameter in better time complexity (general algorithm is O(N^3)).

The computations for individual vertices are independent, so they could be done in parallel.

A little heuristic: start with vertices that have the lowest degree (better chance to disprove the theorem).

Martin Konicek 2009-06-12 21:14:38

I think this is considerably worse than O(n^2). even assuming each node is connected to only 3 other nodes, a stack trace of depth 6 would be 3 * 2^0 + 3 * 2^1 + 3 * 2^2 + 3* 2^3 + 3* 2^4 + 3*2^5. Exponential growth

patros 2009-06-12 21:38:00

For each vertex you visit each vertex at most once, so the run for one vertex takes O(N).

Martin Konicek 2009-06-13 03:01:03

Ah, true, that is a limit on it. I think this is still O(N^3), then isn't it? Finding a path from vertex A to vertex B is O(N), and you must do this O(N^2) times.

patros 2009-06-13 23:20:54

Hi, imagine you want to compute shortest distance from one vertex to every other vertex in the graph. Since the edges don't have weights (length of path = # of edges), this can be done in O(N+#edges): run a breadth first search - http://en.wikipedia.org/wiki/Breadth-first_search. You are right, it's not O(N) but O(N+#edges), where #edges is like 100*N (if user has 100 connections on average). Thanks

Martin Konicek 2009-06-15 10:06:14

+2 A:

I think the most efficient way (worst case) is almost N^3. Build an adjacency matrix, and then take that matrix ^2, ^3, ^4, ^5 and ^6. Look for any entries in the graph that are 0 for matrix through matrix^6.

Heuristically you can try to single out subgraphs ( large clumps of people who are only connected to other clumps by a relatively small number of "bridge" nodes ) but there's absolutely no guarantee you'll have any.

patros 2009-06-12 21:30:07

You can't build an adjacency matrix of size 20x20 million in memory. Also, the multiplication would be O(N^3), where N is 20 million.

Martin Konicek 2009-06-12 21:32:54

It's roughly n^2.8 with strassen's algorithm, since they're square matrices. you also don't need to keep the whole matix in memory, only the parts you're actively multiplying. Page the rest out to disk.

patros 2009-06-12 21:43:19

Does require a lot of disk though... 400 TB for a naive approach. Lots of room for compression though.

patros 2009-06-12 21:50:10

+2 A:

Well a better answer has already been given, but off the top of my head I would have gone with the Floyd-Warshall all pairs shortest path algorithm, which is O(n^3). I'm unsure of the complexity of the graph diameter algorithm, but it "sounds" like this would also be O(n^3). I'd like clarification on this if anyone knows.

On a side note, do you really have such a database? Scary.

Dan Olson 2009-06-15 10:18:59

ansaurus

tags:

views:

answers:

How can I prove the "Six Degrees of Separation" concept programmatically?

related questions