views:

853

answers:

5

I've been playing around with some things and thought up the idea of trying to figure out Kevin Bacon numbers. I have data for a site that for this purpose we can consider a social network. Let's pretend that it's Facebook (for simplification of discussion). I have people and I have a list of their friends, so I have the connections between them. How can I calculate the distance from one person to another (basically, a Kevin Bacon number)?

My best idea is a Bidirectional search, with a depth limit (to limit computational complexity and avoid the problem of people who simply can't be connected in the graph), but I realize this is rather brute force.

Could it be better to make little sub-graphs (say something equivalent to groups on Facebook), calculate the shortest distances between them (ahead of time, perhaps) and then try to use THOSE to find a link? While this requires pre-calculation, it could make it possible to search many fewer nodes (nodes could be groups instead of individuals, making the graph much smaller). This would still be a bidirectional search though.

I could also pre-calculate the number of people an individual is connected to, searching the nodes for "popular" people first since they could have the best chance of connecting to the given destination individual. I realize this would be a trade-off of speed for possible shortest path. I'd think I'd also want to use a depth-first search instead of the breadth-first search I'd plan to use in the other cases.

Can someone think of a simpler/faster way of doing this? I'd like to be able to find the shortest length between two people, so it's not as easy as always having the same end point (such as in the Kevin Bacon problem).

I realize that there are problems like I could get chains of 200 people and such, but that can be solved my having a limit to the depth I'm willing to search.

+4  A: 

Sounds like a job for Dijkstra's algorithm.

ED: Eh, I shouldn't have pulled the trigger so fast. Dijkstra's (and Bellman-Ford) reduces to a breadth-first search when the weights are 1, so this isn't too useful. Oh well.

The A* algorithm, mentioned by tvanfosson, may be ideal for this. The idea is that instead of searching and recursing in whatever order the elements are in each level of the tree (rooted on your start- or end-point), you use some heuristic to determine which element you are going to try first. In your case a good bet would probably be the degree of a node (number of "friends"), but you could possibly want to use the number of people within some arbitrary number of degrees of a given person (i.e., the guy who has has three friends who each have 100 friends is likely to be a better node than the guy who has 20 friends in a clique that shuns outsiders). There's all sorts of other things you could use as a heuristic (friends get 2 points, friends-of-friends get 1 point; whatever, experiment).

Combine this with a depth limit (cut off after 6 degrees of separation, or whatever), and you can vastly improve your average case (worst case is still the same as basic BFS).

Adam Jaskiewicz
Agreed, I've used Dijkstra to solve the Kevin Bacon problem.
sfossen
what's wrong with BFS? I doubt it can be done faster...
Brian Postow
Nothing's wrong with it. If you want to limit the depth to, say, 6 degrees of separation, though, it makes sense to also use some sort of heuristic to determine which node to look at next in your breadth-first search (i.e. A*).
Adam Jaskiewicz
It won't improve worst-case, but it could improve average-case. Yes, it's still BFS, but "BFS" doesn't tell the whole story.
Adam Jaskiewicz
Mainly what I meant about it "not being useful" is that bog-standard BFS had already been mentioned, and I wasn't contributing anything new by suggesting an algorithm that is more general, but reduces to the same thing in this case. I've added more ideas to my answer to hopefully make it better.
Adam Jaskiewicz
+1  A: 

I think your easiest (logically) option is to use a Breadth First algorithm because you can go one level deep at a time and the first one that gets a "hit" is the shortest path automatically.

Of course, the A* algorithm would (likely) improve upon this idea by reordering which, from that level, you choose first.

Joe Philllips
+11  A: 

This is a standard shortest path problem. There are lots of solutions, including Dijkstra's algorithm and Bellman-Ford. You may be particularly interested in looking at the A* algorithm and seeing how it would perform with the cost function relative to the inverse of any particular node's degree. The idea would be to visit more popular nodes (those with higher degree) first.

tvanfosson
+1 As I mentioned after thinking about things for a couple minutes, Dijkstra's and Bellman-Ford will both reduce into a simple breadth-first search when the edge weights are all 1. A* is worth a look, since it adds the heuristic. Combined with a limited depth, it may be the best you can get.
Adam Jaskiewicz
A* is probably the worst of the three for this type of search because it returns only the node closest to the heuristic, while Dijkstra's algorithm returns any of the closest nodes (the first one it finds). And might thus be done sooner because you're not looking for anything specific.
Jasper Bekkers
@Jasper -- the intuition would be that shortest paths tend to go through well-connected nodes -- this would be the hypothesis to test. If true, the heuristic would give you the shortest path sooner leading you to be able to terminate other (non-shortest) potential paths earlier.
tvanfosson
@tvanfosson: using the degrees of the vertices sounds like a good idea, but A* can only find one path to a node. You can't say "give me a path from here to some node that has a high degree" because now you're looking for a group of nodes. Anyway, this is probably something to benchmark.
Jasper Bekkers
+1  A: 

run a breadth-first search in both directions (from each endpoint) and stop when you have a connection or reach your depth limit

Steven A. Lowe
Better than A* in this case as an estimation function may not be available.
Joshua
+1  A: 

This one might be better overall Floyd-Warshall the all pairs shortest distance.

sfossen