views:

74

answers:

3

Some probably wont know what XING is: it is a online network community like linkedIn etc. You can add new contacts, manage these contacts, search for new etc.

The whole application is done in Ruby and somewhat inspired by the small world theory, at least it is said so.

There is one specific feature that I really cant imagine how it is done. If you seach for some person Z, which is not in your contact list, and you click on Z´s profile, all possible connections from you to person Z are shown. Example:

  • YOU -> person B -> person C -> person E -> person Z
  • YOU -> person M -> person N -> person I -> person Z
  • YOU -> person M -> person K -> person J -> person Z

etc.

Especially if you have many contacts there are also many possible connections. And the app is fast!

So, my question is: how is such a feature implemented? What kind of system/database is behind it? I am only experienced in standard RDMBS like mySQL and MSSQL server, and I really cant imagine how to do something like the above mentioned in a standard database system.

Are there any algorithms which can help in computing the possible relationships? Currently I am reading a interesting book on algorithms like computing the similarity of two things etc. so the "implementation of that small world theory" would be very interesting for me.

Thanks in advance for any hints.

+1  A: 

It's probably some combination of pre-caching the people you're likely to click on, combined with any of the various Minimum Spanning Tree algorithms. For example, you might look at Kruskal's Algorithm (http://en.wikipedia.org/wiki/Kruskal%27s_algorithm). It doesn't have quite the behavior you describe, but would be useful for weighted trees where you have some idea of the strength of connection between two people.

For more general info, look at info on spanning trees: http://en.wikipedia.org/wiki/Minimum_spanning_tree

And more generally, read chapter 13 (on graphs) of Goodrich and Tomassia's "Data Structures and Algorithms in Java."

Paul McMillan
+1  A: 

I would have to say it's largely pre-cached data and relationships. As Paul mentioned, minimum spanning tree algorithms play a role too. But in terms of large-scale app speed, caching is golden.

Gabriel Hurley
Yeah, caching everyone who's within 3 or 4 connections would not be an outrageous task.
Paul McMillan
+1  A: 

Caching per-se doesn't really work here, except when it comes to caching a particular result - finding the shortest path between two nodes A and B is equivalent in cost to finding the shortest path between A and every other node N such that |AB| <= |AN|.

The most likely way they're doing this is to load the entire social graph into memory, and perform Dijkstra's algorithm on demand to find all shortest routes from you to the target. Although this doesn't scale in principle, in practice the size of the social graph and the speed of memory access is such that it's practical to calculate this in a very small amount of time. Storing it on disk would make this totally impractical, of course.

Nick Johnson