views:

524

answers:

2

I am having trouble with a QT clustering implementation that I want to implement in Perl.

Algorithm

The line beginning with "identify set", the third from the end, is the part I can't figure out.

The full paper is available here.

+4  A: 

A sub i is a cluster. {A sub 1, A sub 2, ..., A sub |G|} is a cluster of clusters.

Identify set C in {A sub 1, A sub 2, ..., A sub |G|} with maximum cardinality means finding the largest cluster A sub i.

In perl, if the cluster of clusters is:

my @bigun = (
                [1, 2, 3],
                [4, 5, 6, 7],
                [8]
            );

then

# @C = @{ $bigun[1] };

use List::Util qw/reduce/;
my @C = @{ reduce { @$a > @$b ? $a : $b } @bigun };
Pedro Silva
thank you!have another question, how to find distance from A_i to j, if A_i has more that 1 elements?thank you!
Alexandr
Find the center point in the cluster, then compute the euclidean distance between that point A_icenter and j, I suppose.Edit: actually, from wikipedia:"The distance between a point and a group of points is computed using complete linkage, i.e. as the maximum distance from the point to any member of the group (see the "Agglomerative hierarchical clustering" section about distance between clusters)."
Pedro Silva
i mean how (mathematically) to find distance between for example @{$bigun[0]} and @{$bigun[1]}, thank you.
Alexandr
oh, ok, i got it, i hope. thank you!
Alexandr
Well, in 1D: \sqrt [(x - y) ^ 2]; in 2D: \sqrt [(x1 - y1) ^ 2 + (x2 - y2) ^ 2], etc etc.
Pedro Silva
A: 

Dear Sirs: Trying to implement same algorithm. I'm a little confused how do we calculate diameter of a cluster? Thank you! misha680 at gmail dot com

Misha Koshelev
from wikipedia: "The distance between a point and a group of points is computed using complete linkage, i.e. as the maximum distance from the point to any member of the group (see the "Agglomerative hierarchical clustering" section about distance between clusters)."so it's about "complete linkage". diameter in this case is the maximum distance between 2 elements in the cluster (u should check all distances between each two elements and choose the biggest distance)
Alexandr