views:

219

answers:

1

Hi,

I'm not particularly accustomed to generating complex SQL queries and am having difficulty in mixing my understanding of procedural languages and of set-based operations in devising a recursive query for network traversal. I wish to find the set of edges that lie 'upstream' of a particular node through conducting a depth first search of a directed graph (each node can have more than one upstream edge) and would ideally to implement this in SQL.

The pseudocode for what I wish to do looks like the following:

interesting_pipes = Pipes[]

func find_all_pipes_upstream(node n)
 if is_inlet(nodename)
    return Nil
 else
   for p in upstream_pipes:
     if p in interesting_pipes:
       return Nil
   else
     interesting_pipes.append(p)
     find_all_pipes_upstream(upstream_node(p))

I've already written the following functions in pure SQL:

upstream_pipes(node_name varchar) RETURNS SETOF "Pipes"
upstream_node(pipe_name varchar) RETURNS "Node"
is_inlet(node_name) RETURNS boolean

but am struggling to figure out how to manage scoping and return types when translating the above pseudocode to either a PostgreSQL WITH RECURSIVE query or a PL/pgSQL function. Most of the examples I've seen of WITH RECURSIVE queries have been devised for traversing back through trees where each node has just one parent. Has anyone any tips/advise as how best to go about this?

Cheers,

Will

A: 

See:

http://www.postgresql.org/docs/8.4/static/queries-with.html

This query will loop if the link relationships contain cycles. Because we require a "depth" output, just changing UNION ALL to UNION would not eliminate the looping. Instead we need to recognize whether we have reached the same row again while following a particular path of links. We add two columns path and cycle to the loop-prone query:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
        SELECT g.id, g.link, g.data, 1,
          ARRAY[g.id],
          false
        FROM graph g
      UNION ALL
        SELECT g.id, g.link, g.data, sg.depth + 1,
          path || g.id,
          g.id = ANY(path)
        FROM graph g, search_graph sg
        WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
Denis
Thanks, Denis. I was able to use the 'WITH RECURSIVE' query featured in the example you cited after realizing that I needed to create a view 'graph(pipe_id, downstream_node_id, upstream_node_id)', which was done as a non-recursive WITH sub-query to ensure it was only executed once for each invocation of the main graph traversal query. My graph traversals now take 1/200 of the time that they did when I was using Python/ODBC.
Will Furnass