tags:

views:

50

answers:

1

I have data representing the paths people take across a fixed set of points (discrete, e.g., nodes and edges). So far I have been using igraph. I haven't found a good way yet (in igraph or another package) to create "canonical" paths summarizing what significant sub-groups of respondents are doing. A canonical path can be operationalized in any reasonable way and is just meant to represent a typical path or sub-path for a significant portion of the population. Does there already exist a function to create these within igraph or another package? Thanks.

+1  A: 

One option: represent each person's movement as a directed edge. Create an aggregate graph such that each edge has a weight corresponding to the number of times that edge occurred. Those edges with large weights will be "typical" 1-paths.

Of course, it gets more interesting to find common k-paths or explore how paths vary among individuals. The naive approach for 2-paths would be to create N additional nodes that correspond to nodes when visited in the middle of the 2-path. For example, if you have nodes a_1, ..., a_N you would create nodes b_1, ..., b_N. The aggregate network might have an edge (a_3, b_5, 10) and an edge (b_5, a_7, 10); this would represent the two-path (a_3, b_5, a_7) occurring 10 times. The task you're interested in corresponds to finding those two-paths with large weights.

Both the igraph and network packages would suffice for this sort of analysis.

If you have some bound on k (ie. only 6-paths occur in your dataset), I might also suggest enumerating all the paths that are taken and computing the histogram of each unique path. I don't know of any functions that do this automagically for you.

Christopher DuBois