views:

46

answers:

2

I have a large directed, acylic graph (DAG) from which I would like to efficiently draw a sample node according to the following criteria:

  1. I specify a fixed node A that must never be sampled
  2. Nodes that directly or indirectly refer to A are never sampled
  3. All other nodes are sampled with equal probability

Nodes are stored as objects with pointers to the other nodes that they refer to, the entire graph can be reached from a single root node that refers to everything else directly or indirectly.

Is there a good algorithm to do this? Ideally without requiring large amounts of additional memory since the DAG is large!

A: 

I'm not an expert in this area by any means, but I think you may want a Monte Carlo Markov chain sampling method such as the Metropolis-Hastings or Gibbs sampling algorithm.

You can find some code samples online, and you might have to modify the code to do exactly what you want it to do. There's some good presentations on the topic, like this.

Some software that might help you along are:

I don't know your familiarity with graph theory, so I'm not sure how difficult it would be for you to implement this.

Hope this was helpful...

The Alchemist
Thanks for the interesting links! However I'm not so much after general sampling theory and tools but a specific algorithm for sampling a DAG
mikera
Sorry about that! I think aioobe has a pretty good solution, which is a basic Monte Carlo Markov chain solution, IIRC.If you have a large graph, picking a good random number generator for aioobe's step 3 will be more and more important.
The Alchemist
+2  A: 

The only solution I can come up with is to

  1. put the nodes in a hash set
    (traverse them from the root using, say, a breadth first traversal), O(|E|+|V|)

  2. start from node A and remove all predecessors by traversing the edges backwards
    (again O(|E|+|V|))

  3. select a random node from the remaining nodes.

This would result in a O(|E|+|V|) algorithm with a O(|V|) memory requirement.

Note that you wouldn't have to copy the nodes in step 1, only save a reference to the node.

aioobe