What would be the optimal way of clustering nodes after cutting a MST with a maximum edge-length? My output from MST edge-length cut is a 2xN array where each element is an integer. The integers are the node identifiers that describe the edges. An example of the output would is given below:
>>> print array[0:3]
[[ 0 1]
[ 0 2]
[ 2 17]]
I'm typically dealing with 100 to 20,000 nodes. My MST code is sufficiently fast, but it's being bogged down by the clustering/grouping algorithm. It's a loop-heavy set of functions and that's what is slowing it down. Check out the following code. Any ideas on how to speed it up? I'm aware that this is a brute force method, so a cleaner method would be best. Thanks in advance for your help!
Cheers,
Eli
def _super_intersection(edges):
group = set(edges[0])
index = np.array([0])
k = 0
while k < 100:
k += 1
i = 0
for edge in edges[1:]:
i += 1
edge = set(edge)
if group & edge:
group = group | edge
index = np.append(index, i)
index = np.unique(np.array(index))
return group, index
def cluster(self, gmin = 5):
# A 2xN array of node IDs
edges = self.edges
group_nodes = {}
for no, edge in enumerate(edges):
try:
group, indice = _super_intersection(edges)
id_no = no
edges = np.delete(edges,indice,0)
if len(group) >= gmin:
group_nodes[id_no] = list(group)
except:
self.group_nodes = group_nodes