I'm rewriting a data-driven legacy application in Python. One of the primary tables is referred to as a "graph table", and does appear to be a directed graph, so I was exploring the NetworkX package to see whether it would make sense to use it for the graph table manipulations, and really implement it as a graph rather than a complicated set of arrays.
However I'm starting to wonder whether the way we use this table is poorly suited for an actual graph manipulation library. Most of the NetworkX functionality seems to be oriented towards characterizing the graph itself in some way, determining shortest distance between two nodes, and things like that. None of that is relevant to my application.
I'm hoping if I can describe the actual usage here, someone can advise me whether I'm just missing something -- I've never really worked with graphs before so this is quite possible -- or if I should be exploring some other data structure. (And if so, what would you suggest?)
We use the table primarily to transform a user-supplied string of keywords into an ordered list of components. This constitutes 95% of the use cases; the other 5% are "given a partial keyword string, supply all possible completions" and "generate all possible legal keyword strings". Oh, and validate the graph against malformation.
Here's an edited excerpt of the table. Columns are:
keyword innode outnode component
acs 1 20 clear
default 1 100 clear
noota 20 30 clear
default 20 30 hst_ota
ota 20 30 hst_ota
acs 30 10000 clear
cos 30 11000 clear
sbc 10000 10199 clear
hrc 10000 10150 clear
wfc1 10000 10100 clear
default 10100 10101 clear
default 10101 10130 acs_wfc_im123
f606w 10130 10140 acs_f606w
f550m 10130 10140 acs_f550m
f555w 10130 10140 acs_f555w
default 10140 10300 clear
wfc1 10300 10310 acs_wfc_ebe_win12f
default 10310 10320 acs_wfc_ccd1
Given the keyword string "acs,wfc1,f555w" and this table, the traversal logic is:
Start at node 1; "acs" is in the string, so go to node 20.
None of the presented keywords for node 20 are in the string, so choose the default, pick up hst_ota, and go to node 30.
"acs" is in the string, so go to node 10000.
"wfc1" is in the string, so go to node 10100.
Only one choice; go to node 10101.
Only one choice, so pick up acs_wfc_im123 and go to node 10130.
"f555w" is in the string, so pick up acs_f555w and go to node 10140.
Only one choice, so go to node 10300.
"wfc1" is in the string, so pick up acs_wfc_ebe_win12f and go to node 10310.
Only one choice, so pick up acs_wfc_ccd1 and go to node 10320 -- which doesn't exist, so we're done.
Thus the final list of components is
hst_ota
acs_wfc_im123
acs_f555w
acs_wfc_ebe_win12f
acs_wfc_ccd1
I can make a graph from just the innodes and outnodes of this table, but I couldn't for the life of me figure out how to build in the keyword information that determines which choice to make when faced with multiple possibilities.
Updated to add examples of the other use cases:
Given a string "acs", return ("hrc","wfc1") as possible legal next choices
Given a string "acs, wfc1, foo", raise an exception due to an unused keyword
Return all possible legal strings:
- cos
- acs, hrc
- acs, wfc1, f606w
- acs, wfc1, f550m
- acs, wfc1, f555w
Validate that all nodes can be reached and that there are no loops.
I can tweak Alex's solution for the first two of these, but I don't see how to do it for the last two.