views:

166

answers:

4

Given a DAG, in which each node belongs to a category, how can this graph be transformed into a table with a column for each category? The transformation doesn't have to be reversible, but should preserve useful information about the structure of the graph; and should be a 'natural' transformation, in the sense that a person looking at the graph and the table should not be surprised by any of the rows. It should also be compact, i.e. have few rows.

For example given a graph of nodes a1,b1,b2,c1 with edges a1->b1, a1->b2, b1->c1, b2->c1 (i.e. a diamond-shaped graph) I would expect to see the following table:

a  b  c
--------
a1 b1 c1
a1 b2 c1

I've thought about this problem quite a bit, but I'm having trouble coming up with an algorithm that gives intuitive results on certain graphs. Consider the graph a1,b1,c1 with edges a1->c1, b1->c1. I'd like the algorithm to produce this table:

a  b  c 
--------
a1 b1 c1

But maybe it should produce this instead:

a  b  c 
--------
a1    c1
a1 b1

I'm looking for creative ideas and insights into the problem. Feel free to vary to simplify or constrain the problem if you think it will help.

Brainstorm away!

Edit:

The transformation should always produce the same set of rows, although the order of rows does not matter.

The table should behave nicely when sorting and filtering using, e.g., Excel. This means that mutliple nodes cannot be packed into a single cell of the table - only one node per cell.

A: 

How about compacting all reachable nodes from one node together in one cell ? For example, your first DAG should look like:

a   b        c
---------------
a1  [b1,b2]  
    b1       c1
    b2       c1
Joy Dutta
It's not really what I'm looking for in the table. I guess I consider it important that the table be well behaved in a relational way, in the sense that I'd like to be able to sort on columns, and filter using e.g. autofilter in excel. This representation wouldn't be compatible with that goal. Thanks for the suggestion though!
rattigan
+2  A: 

What you need is a variation of topological sorting. This is an algorithm that "sorts" graph vertexes as if a---->b edge meant a > b. Since the graph is a DAG, there is no cycles in it and this > relation is transitive, so at least one sorting order exists.

For your diamond-shaped graph two topological orders exist:

a1 b1 b2 c1
a1 b2 b1 c1

b1 and b2 items are not connected, even indirectly, therefore, they may be placed in any order.

After you sorted the graph, you know an approximation of order. My proposal is to fill the table in a straightforward way (1 vertex per line) and then "compact" the table. Perform sorting and pick the sequence you got as output. Fill the table from top to bottom, assigning a vertex to relevant column:

a  b  c
--------
a1 
   b2 
   b1
      c1

Now compact the table by walking from top to bottom (and then make similar pass from bottom to top). On each iteration, you take a closer look to a "current" row (marked as =>) and to the "next" row.

  1. If in a column nodes in current and next node differ, do nothing for this column:

         from     ---->      to
       X  b  c            X  b  c
       --------           --------
    => X1 .  .           X1 .  .
       X2 .  .        => X2 .  .
    
  2. If in a column X in the next row there is no vertex (table cell is empty) and in the current row there is vertex X1, then you sometimes should fill this empty cell with a vertex in the current row. But not always: you want your table to be logical, don't you? So copy the vertex if and only if there's no edge b--->X1, c--->X1, etc, for all vertexes in current row.

         from     --->      to
       X  b  c           X  b  c
       --------          --------
    => X1 b  c           X1 b  c
          b1 c1       => X1 b1 c1
    

(Edit:) After first (forward) and second (backward) passes, you'll have such tables:

 first       second
a  b  c     a  b  c 
--------    --------
a1          a1 b2 c1     
a1 b2       a1 b2 c1  
a1 b1       a1 b1 c1  
a1 b1 c1    a1 b1 c1

Then, just remove equal rows and you're done:

a  b  c 
--------
a1 b2 c1  
a1 b1 c1

And you should get a nice table. O(n^2).

Pavel Shved
This is an interesting approach. However, I'm wondering if it is deterministic - does the result depend on which topological ordering you choose, at least for more complex DAGs? I don't care if the same set of rows results in a different order, but I wouldn't be happy with an algorithm that can result in a different set of rows. I will update the question to reflect this requirement.
rattigan
I tried to do this with the DAG with edges a1->b1, b1->c1, c1->d1, a1->b2, b2->c2, c2->d1. I'm not sure I understand rule two, so I wasn't able to do it. But the point is this graph has topological orderings that seem like they could produce different results: a1,b1,c1,b2,c2,d1 and a1,b1,b2,c1,c2,d1. Do you see what I mean?
rattigan
@rattigan, yes, it can produce different results. You can perform more specific crunching to sort records (for example, to add more edges to your graph), but it's out of scope of my answer. As for rule two, I'm fixing wording for you to try it again.
Pavel Shved
A: 

It sounds like a train system map with stations within zones (a,b,c).

You could be generating a table of all possible routes in one direction. In which case "a1, b1, c1" would seem to imply a1->b1 so don't format it like that if you have only a1->c1, b1->c1

You could decide to produce a table by listing the longest routes starting in zone a, using each edge only once, ending with the short leftover routes. Or allow edges to be reused only if they connect unused edges or extend a route.

In other words, do a depth first search, trying not to reuse edges (reject any path that doesn't include unused edges, and optionally trim used edges at the endpoints).

Erik Olson
I like this idea. However, does the set of rows produced depend on the order of traversal of the table? (see edit in question) Perhaps this variation is also an option: form the set of all possible routes - excluding routes that are subroutes of other routes - and emit a row for each route.
rattigan
A: 

Here's what I ended up doing:

  • Find all paths emanating from a node without in-edges. (Could be expensive for some graphs, but works for mine)
  • Traverse each path to collect a row of values
  • Compact the rows

Compacting the rows is dones as follows.

  • For each pair of columns x,y
    • Construct a map of every value of x to it's possible values of y
    • Create another map For entries that only have one distinct value of y, mapping the value of x to its single value of y.
  • Fill in the blanks using these maps. When filling in a value, check for related blanks that can be filled.

This gives a very compact output and seems to meet all my requirements.

rattigan