views:

100

answers:

2

At linkedin, when you visit someones profile you can see how you are connected to them. I believe that linkedin shows upto 3rd level connections if not more, something like

shabda -> Foo user, bar user, baz user -> Joel's connection -> Joel

How can I represent this in the database.

If I model as,


User
  Id PK
  Name Char

Connection
  User1 FK
  User2 FK

Then to find the network, three levels deep, I need to get all my connection, their connections, and their connections, and then see if the current user is there. This obviously would be very inefficient with DB of any size, and probably clunky to work with as well.

Since, on linked in I can see this network, on any profile I visit, I don't think this is precalculated either.

The other thing which comes to my mind is probably this is best not stored in a relational DB, but then what would be the best way to store and retrieve it?

+5  A: 

My recommendation would be to use a graph database. There seems to be only one implementation currently available, and that's Neo4j. It's written in Java, but has bindings to Ruby and Scala (Python in progress).

If you don't know Java, you probably won't be able to find anything similar on any other platform (yet), unfortunately. However, if you do know Java (or are at least willing to learn), it's way worth it. (Technically you don't even need to learn Java because of the Ruby/Python bindings.) Neo4j was built for exactly what you're trying to do. You'd go through a ton of trouble trying to implement this in a relational database, when you'd be able to do the exact same thing in only a few lines of Java code, and also much more efficiently.

If that's not an option, I'd still recommend looking at other database types such as object databases. Relational databases weren't built for this kind of thing, and you'd go through more pain by trying to do it in an RDBMS than by switching to a different kind of database and learning it.

musicfreak
Thanks, that was what I was thinking too. Iused to work in Java, a few years back, hopefully they haven't rusted totally, and can be put to some good use.
uswaretech
To be honest, I'd never worked with Java before and I found Neo4j very straight-forward. So you'll be fine. :)
musicfreak
You'll find information about the language bindings for Neo4j on the wiki: http://wiki.neo4j.org/ . You can also build a domain specific RESTful API in Ruby or Scala (there's links in the wiki) if that suits your application well.
nawroth
+3  A: 

I don't see why there's anything wrong with using a relational database for this. The tables defined in the question are an excellent start. With proper optimization you'll be able to keep your performance well in hand. I personally think you would need something serious to justify shifting away from such a versatile mainstream product. You'll probably need an RBDMS in the project anyway and there are an unmatchable amount of legitimate choices in many price ranges (even free). You'll get quality documentation, support will be available, and you'll have a large supply of highly trained developers available in the job pool.

Regarding this model of self-relationships (users joined to other users), I recommend looking into recursive queries. That will keep you from performing a cascade of individual queries to find 3 levels of relationships. Consider the following SQL Server method for performing recursive queries with CTEs.

http://msdn.microsoft.com/en-us/library/ms186243.aspx

It allows you to specify how deep you want to go with the MAXRECURSION hint.

Next, you need to start thinking of ways to optimize. That starts with standard best-practices for setting up your tables with proper indexes and maintenance, etc. It inevitably ends with denormalization. That's one of those things you only do once you've already tried everything else, but if you know what you're doing and use good practices then your performance gain will be significant. There are many resources on the internet to help you learn about denormalization, just look it up.

SurroundedByFish
Why? Because you'll get extremely poor performance with an RDBMS. Try going beyond 3 levels of depth of relationships. It just can't happen, unless your users are willing to wait a few seconds for it to load (and your database server can handle it). Neo4j was built from the ground up with this in mind, and (from what I hear) going 100,000 levels deep can be done in a matter of a few seconds. Can an RDBMS do that?
musicfreak