tags:

views:

338

answers:

3

I have a data.frame with 2 columns: Node A, Node B. Each entry in the frame implies an edge in a graph between node A and B.

There must be a nice one-liner to convert this data.frame into an adjacency list. Any hints?

A: 

how would you even represent an adjacency list in R? it needs variable-sized lists for the set of adjacent nodes; so then you have to use a list(); but then what good is it having it in R?

i can think of lame tricks with sapply-like functions but they do a linear scan for every node. but playing around for 1 minute, here is: a list of pairlists, where the second item of each pair is the adjacency list. output is crazier than the datstructure really is.

> edgelist=data.frame(A=c(1,1,2,2,2),B=c(1,2,2,3,4))
> library(plyr)
> llply(1:max(edgelist), function(a) list(node=a, adjacents=as.list(edgelist$B[edgelist$A==a])))
[[1]]
[[1]]$node
[1] 1

[[1]]$adjacents
[[1]]$adjacents[[1]]
[1] 1

[[1]]$adjacents[[2]]
[1] 2



[[2]]
[[2]]$node
[1] 2

[[2]]$adjacents
[[2]]$adjacents[[1]]
[1] 2

[[2]]$adjacents[[2]]
[1] 3

[[2]]$adjacents[[3]]
[1] 4



[[3]]
[[3]]$node
[1] 3

[[3]]$adjacents
list()


[[4]]
[[4]]$node
[1] 4

[[4]]$adjacents
list()
Brendan OConnor
Brendan- the standard way (at least from igraph's) point of view is a list of vertices - and each list element is a vector of adjacent vertices.
Josh Reich
+2  A: 

Quick and dirty ...

> edges <- data.frame(nodea=c(1,2,4,2,1), nodeb=c(1,2,3,4,5))

> adjlist <- by(edges, edges$nodea, function(x) x$nodeb)

> for (i in as.character(unique(edges$nodea))) {
+   cat(i, ' -> ', adjlist[[i]], '\n')
+ }

1  ->  1 5
2  ->  2 4
4  ->  3

> adjlist
edges$nodea: 1
[1] 1 5
------------------------------------------------------------
edges$nodea: 2
[1] 2 4
------------------------------------------------------------
edges$nodea: 4
[1] 3
ars
Guh. Yep. That is a perfect one liner. Oddly enough, my for loop solution runs twice as fast as by().
Josh Reich
indeed it's not very fast when your table is 50,000 long (with ~5000 identifiers). Are there faster alternatives?
Yannick Wurm
+3  A: 
> edges <- data.frame(nodea=c(1,2,4,2,1), nodeb=c(1,2,3,4,5))

> attach(edges)

> tapply(nodeb,nodea,unique)

$`1`
[1] 1 5

$`2`
[1] 2 4

$`4`
[1] 3
for some strange reason internal to R `tapply(as.character(nodeb),as.character(nodea),unique)` is 100s of times faster at converting my very long table (100,000 lines) to a list than is `tapply(nodeb,nodea,unique)`!!!
Yannick Wurm