views:

258

answers:

4

Curious what is recognized as a solid algorithm/approach for judging the strength of a directed acyclic graph - particularly the strength of certain nodes. The main question I have about this can be boiled down to the following two graphs:

(if graph doesn't show up, click here or visit this link: http://www.flickr.com/photos/86396568@N00/2893003041/

To my eyes, A is in a stronger position than a. I am judging strength by how strong a node can remain if a link is knocked out. I'd call the first a thin "stilt", and the second a thick "stalk".

Here are the approaches I've considered so far for judging the strength of a node:

1) Counting the number of nodes below, subtracting the number of nodes above.

  • A=7, a=7, B=5, b=1

2) Counting the number of complete paths (to termination) for each node, summing their lengths.

  • A=17 (1+5+5+5+1), B=12 (4+4+4), a=9 (3+3+3), b=2
  • This make the stilt stronger, rather than the stalk.

3) Counting every possible path, treating every node as a destination.

  • A=9 (A->B, A->C, A->D, A->E, A->G, 2xA->F, 2xA->H), B=6, a=9, b=2

3 seems like the best option so far, but is there one that is better, generalized for DAGs? Is this something that has a known best approach? The principles are to use as much information in the graph as possible, and for the solution to be explainable in an intuitive way.

+2  A: 

This really depends on what you mean by strength. Because of the versatility of the DAG in representing information, you could be discussing anything from a multiple-outcome control flow to argument clauses of non-adverbial discourse connectives or even the full set of dependencies between different words in a sentence.

All of these would view node strength in a different way. For instance, a control flow may consider the node with the most amount of outcomes (therefore the most outward arcs) as the strongest, because it has the most power over the eventual outcome of the diagram. In discourse, the strongest node is the discourse connective, but it is found in speech and text after the first connective and before the third. Selection of the lexical "head" of a sentence is not directly related to the amount of arcs directly interacting with it.

What I'm getting at is that there is no real panacea for computing "strength" in this data type because of the polysemous character of the term "strength" and the kind of data a DAG fits. I would say that in a machine learning problem, all three of these approaches would be very informative in selecting particular types of nodes for a classification or ranking problem, but in the end, the answer depends upon the data type's practical application.

Robert Elwell
+1  A: 

I think you need to define "strength" more clearly. Is this related to the maximum flow problem?

Logan
A: 

Okay, the practical application is sports teams. Each node is a team, each link is a victory over another team. Assume there are no circular victory paths, like A->B->C->A. The objective is to get a power ranking that doesn't conflict with the graph, and ranks the teams in order of a team's strength. The site in question is my (somewhat tongue-in-cheek) football site, http://beatpaths.com/ , where you can see full graphs of the entire NFL season each week. (And other sports.) I'm basically looking for ranking algorithms other than the ones I listed above that might make even more sense and can be defended as using all the information in the graph. The goal isn't necessarily to be more accurate in terms of future picks (although a stronger algorithm probably would be), but instead to be as reasonably descriptive as possible of the season so far.

You can see Week 3 of the NFL Season up on the site. I've removed two "beatloops" (long story) that were ambiguous, relying on the rest of the graph to determine the pecking order.

This exact "practical application" is one of the datasets in the Stanford Graphbase, and is discussed in Knuth, Volume 4.
Michael Dorfman
A: 

If you're looking to make predictions, your best bet is probably going to be a maximum entropy ranking algorithm. The problem is going to be developing a large enough data set for your learner--the more, the better. It sounds like you can use the standings at each week of games as a single ranking instance.

Robert Elwell