views:

111

answers:

1

Classic social networks can be represented as a graph/matrix.

With a graph/matrix one can easily compute

  • shortest path between 2 participants
  • reachability from A -> B
  • general statistics (reciprocity, avg connectivity, etc)
  • etc

Is there an ideal data structure (or a modification to graph/matrix) that enables easy computation of the above while being time aware?

For example,

Input

t = 0...100

  • A <-> B (while t = 0...10)
  • B <-> C (while t = 5...100)
  • C <-> A (while t = 50...100)

Sample Queries

  • Is A associated with B at any time? (yes)
  • Is A associated with B while B is associated with C? (yes. @t = 5...10)
  • Is C ever reachable from A (yes. @ t=5 )
+3  A: 

What you're looking for is an explicitly persistent data structure. There's a fair body of literature on this, but it's not that well known. Chris Okasaki wrote a pretty substantial book on the topic. Have a look at my answer to this question.

Given a full implementation of something like Driscoll et al.'s node-splitting structure, there are a few different ways to set up your queries. If you want to know about stuff true in a particular time range, you would only examine nodes containing data about that time range. If you wanted to know what time range something was true, you would start searching, and progressively tighten your bounds as you explore each new node. Just remember that your results might not always be contiguous - consider two people start dating, breaking up, and getting back together.

I would guess that there's probably at least one publication worth of unexplored territory in how to do interesting queries over persistent graphs, if not much more.

Novelocrat