views:

78

answers:

3

I have data in the below form, which makes up a bipartite network.

A1 - B1
A2 - B2
A2 - B1
A3 - B1
A4 - B2
A5 - B3
A6 - B3
A7 - B3
A7 - B3
A8 - B4
A9 - B3

What I would like to do is write something (ideally in python or C) or use an existing library to identify individual communities within the data. For instance

A1,A2,A3,A4 are all part of the same community because they connect to B1,B2 similarly A5,A6,A7,A8,A9 all connected to B3 and B4.

I am a bit confused having read lots various articles about network flow and graphs as to exactly where my problem sits. Is this just a form of Breadth-first search or is there a more efficient means of doing this?

Thanks

+1  A: 

If you want to use Python, read about the NetworkX library. It has lots of modules and algorithm implementations for graphs. In particular, you may find the Bipartite module useful. I'm not sure what you mean by "communities", but the bipartite_color function from that module may help you.

Eli Bendersky
By communities I mean that within my data I will have several unconnected bipartite graphs, I want a way of establishing all A and B's that relate to each unconnected bipartite graph within the data.
David
@David: then try first running the "connected components" algorithm on your data (that one also exists in NetworkX) to find the connected sub graphs. Then, you can determine the bi-partite segmentation of each component/subgraph.
Eli Bendersky
A: 

Maybe something like:

import collections

data = ( ("A1", "B1"), ("A2", "B2"), ("A2", "B1") )
out = collections.defaultdict(list)

for value, key in data:
  out[key].append(value)

print out
-> defaultdict(<type 'list'>, {'B1': ['A1', 'A2'], 'B2': ['A2']})

This only works one-way though. You could of course make 2 dicts, one with the A set as key and one with the B set as key. It assumes that the keys are immutable (strings, numbers).

extraneon
I think if I was to go down this path I would have to use the two dicts like you say as B1 is related to B2 due to A2 being a value in both.
David
+2  A: 

Using Python and the igraph library, you may do the following:

import igraph
graph = igraph.Graph.Formula("A1-B1, A2-B2, A2-B1, A3-B1, A4-B2, A5-B3, A6-B3, A7-B3, A8-B4, A9-B3")
comms = graph.clusters()
for comm in comms:
    print ", ".join(graph.vs[comm]["name"])

A short explanation: Graph.Formula constructs a graph out of a string representation like the one above, but you can use any other method provided by igraph to construct your graph. An advantage of using Graph.Formula is that it automatically creates a name vertex attribute containing the vertex names. graph.clusters() searches for the connected components of the network and returns a VertexClustering object. This object can be used in a for loop to iterate over the components. In the core of the for loop, the comm variable will always contain the indices of the nodes in the current community. I select the vertices of the community using graph.vs[comm], request their names as a list (graph.vs[comm]["name"]) and then join the names by commas.

Tamás