views:

35

answers:

1

What would be the optimal way of clustering nodes after cutting a MST with a maximum edge-length? My output from MST edge-length cut is a 2xN array where each element is an integer. The integers are the node identifiers that describe the edges. An example of the output would is given below:

>>> print array[0:3]
[[  0   1]
 [  0   2]
 [  2  17]]

I'm typically dealing with 100 to 20,000 nodes. My MST code is sufficiently fast, but it's being bogged down by the clustering/grouping algorithm. It's a loop-heavy set of functions and that's what is slowing it down. Check out the following code. Any ideas on how to speed it up? I'm aware that this is a brute force method, so a cleaner method would be best. Thanks in advance for your help!

Cheers,

Eli

def _super_intersection(edges):
    group = set(edges[0])
    index = np.array([0])
    k = 0
    while k < 100:
        k += 1
        i = 0
        for edge in edges[1:]:
             i += 1
             edge = set(edge)
             if group & edge:
                 group = group | edge
                 index = np.append(index, i)

index = np.unique(np.array(index))
return group, index


def cluster(self, gmin = 5):
    # A 2xN array of node IDs
    edges = self.edges
    group_nodes = {}
    for no, edge in enumerate(edges):
        try:
            group, indice = _super_intersection(edges)
            id_no = no                
            edges = np.delete(edges,indice,0)
            if len(group) >= gmin:
                group_nodes[id_no] = list(group)
        except:
            self.group_nodes = group_nodes
A: 

The problem has been solved. Go to the NetworkX google group link to see the solution.

http://groups.google.com/group/networkx-discuss/browse_thread/thread/4ac4250d460a1b75

Cheers,

Eli

ebressert
That message explains the problem clearly (and with nice pictures), thanks!
Marius Gedminas
Also, can you mark this answer as accepted?
Marius Gedminas