views:

587

answers:

2

Hello!

I was looking for a way to calculate dynamic market values in a soccer manager game. I asked this question here and got a very good answer from Alceu Costa.

I tried to code this algorithm (90 elements, 5 clustes) but it doesn't work correctly:

  1. In the first iteration, a high percentage of the elements changes its cluster.
  2. From the second iteration, all elements change their cluster.
  3. Since the algorithm normally works until convergence (no element changes its cluster), it doesn't finish in my case.
  4. So I set the end to the 15th iteration manually. You can see that it runs infinitely.

You can see the output of my algorithm here. What's wrong with it? Can you tell me why it doesn't work correctly?

I hope you can help me. Thank you very much in advance!

Here's the code:

<?php
include 'zzserver.php';
function distance($player1, $player2) {
    global $strengthMax, $maxStrengthMax, $motivationMax, $ageMax;
    // $playerX = array(strength, maxStrength, motivation, age, id);
    $distance = 0;
    $distance += abs($player1['strength']-$player2['strength'])/$strengthMax;
    $distance += abs($player1['maxStrength']-$player2['maxStrength'])/$maxStrengthMax;
    $distance += abs($player1['motivation']-$player2['motivation'])/$motivationMax;
    $distance += abs($player1['age']-$player2['age'])/$ageMax;
    return $distance;
}
function calculateCentroids() {
    global $cluster;
    $clusterCentroids = array();
    foreach ($cluster as $key=>$value) {
        $strenthValues = array();
        $maxStrenthValues = array();
        $motivationValues = array();
        $ageValues = array();
        foreach ($value as $clusterEntries) {
            $strenthValues[] = $clusterEntries['strength'];
            $maxStrenthValues[] = $clusterEntries['maxStrength'];
            $motivationValues[] = $clusterEntries['motivation'];
            $ageValues[] = $clusterEntries['age'];
        }
        if (count($strenthValues) == 0) { $strenthValues[] = 0; }
        if (count($maxStrenthValues) == 0) { $maxStrenthValues[] = 0; }
        if (count($motivationValues) == 0) { $motivationValues[] = 0; }
        if (count($ageValues) == 0) { $ageValues[] = 0; }
        $clusterCentroids[$key] = array('strength'=>array_sum($strenthValues)/count($strenthValues), 'maxStrength'=>array_sum($maxStrenthValues)/count($maxStrenthValues), 'motivation'=>array_sum($motivationValues)/count($motivationValues), 'age'=>array_sum($ageValues)/count($ageValues));
    }
    return $clusterCentroids;
}
function assignPlayersToNearestCluster() {
    global $cluster, $clusterCentroids;
    $playersWhoChangedClusters = 0;
    // BUILD NEW CLUSTER ARRAY WHICH ALL PLAYERS GO IN THEN START
    $alte_cluster = array_keys($cluster);
    $neuesClusterArray = array();
    foreach ($alte_cluster as $alte_cluster_entry) {
        $neuesClusterArray[$alte_cluster_entry] = array();
    }
    // BUILD NEW CLUSTER ARRAY WHICH ALL PLAYERS GO IN THEN END
    foreach ($cluster as $oldCluster=>$clusterValues) {
        // FOR EVERY SINGLE PLAYER START
        foreach ($clusterValues as $player) {
            // MEASURE DISTANCE TO ALL CENTROIDS START
            $abstaende = array();
            foreach ($clusterCentroids as $CentroidId=>$centroidValues) {
                $distancePlayerCluster = distance($player, $centroidValues);
                $abstaende[$CentroidId] = $distancePlayerCluster;
            }
            arsort($abstaende);
            if ($neuesCluster = each($abstaende)) {
                $neuesClusterArray[$neuesCluster['key']][] = $player; // add to new array
                // player $player['id'] goes to cluster $neuesCluster['key'] since it is the nearest one
                if ($neuesCluster['key'] != $oldCluster) {
                    $playersWhoChangedClusters++;
                }
            }
            // MEASURE DISTANCE TO ALL CENTROIDS END
        }
        // FOR EVERY SINGLE PLAYER END
    }
    $cluster = $neuesClusterArray;
    return $playersWhoChangedClusters;
}
// CREATE k CLUSTERS START
$k = 5; // Anzahl Cluster
$cluster = array();
for ($i = 0; $i < $k; $i++) {
    $cluster[$i] = array();
}
// CREATE k CLUSTERS END
// PUT PLAYERS IN RANDOM CLUSTERS START
$sql1 = "SELECT ids, staerke, talent, trainingseifer, wiealt FROM ".$prefix."spieler LIMIT 0, 90";
$sql2 = mysql_abfrage($sql1);
$anzahlSpieler = mysql_num_rows($sql2);
$anzahlSpielerProCluster = $anzahlSpieler/$k;
$strengthMax = 0;
$maxStrengthMax = 0;
$motivationMax = 0;
$ageMax = 0;
$counter = 0; // for $anzahlSpielerProCluster so that all clusters get the same number of players
while ($sql3 = mysql_fetch_assoc($sql2)) {
    $assignedCluster = floor($counter/$anzahlSpielerProCluster);
    $cluster[$assignedCluster][] = array('strength'=>$sql3['staerke'], 'maxStrength'=>$sql3['talent'], 'motivation'=>$sql3['trainingseifer'], 'age'=>$sql3['wiealt'], 'id'=>$sql3['ids']);
    if ($sql3['staerke'] > $strengthMax) { $strengthMax = $sql3['staerke']; }
    if ($sql3['talent'] > $maxStrengthMax) { $maxStrengthMax = $sql3['talent']; }
    if ($sql3['trainingseifer'] > $motivationMax) { $motivationMax = $sql3['trainingseifer']; }
    if ($sql3['wiealt'] > $ageMax) { $ageMax = $sql3['wiealt']; }
    $counter++;
}
// PUT PLAYERS IN RANDOM CLUSTERS END
$m = 1;
while ($m < 16) {
    $clusterCentroids = calculateCentroids(); // calculate new centroids of the clusters
    $playersWhoChangedClusters = assignPlayersToNearestCluster(); // assign each player to the nearest cluster
    if ($playersWhoChangedClusters == 0) { $m = 1001; }
    echo '<li>Iteration '.$m.': '.$playersWhoChangedClusters.' players have changed place</li>';
    $m++;
}
print_r($cluster);
?>
A: 

EDIT: disregard, I still get the same result as you, everyone winds up in cluster 4. I shall reconsider my code and try again.

I think I've realised what the problem is, k-means clustering is designed to break up differences in a set, however, because of the way you calculate averages etc. we are getting a situation where there are no large gaps in the ranges.

Might I suggest a change and only concentrate on a single value(strength appears to make most sense to me) to determine the clusters, or abandon this sorting method altogether, and adopt something different(not what you want to hear I know)?

I found a rather nice site with an example k-mean sort using integers, I'm going to try and edit that, I will get back with the results some time tomorrow.

http://code.blip.pt/2009/04/06/k-means-clustering-in-php/ <-- link I mentioned and forgot about.

scragar
It would be cool if you posted the URL for that example, I'm also interested in it.
Alix Axel
Thank you scragar for your try. I'm looking forward to seeing a second approach from you. :)No, I don't want to focus on one single value. :) If it is possible with integers, it must also be possible with floating point values ...
+1 for the good link. If it doesn't help me, it can surely help other people. It's short and well-documented.
A: 

It's so embarassing :D I think the whole problem is caused by only one letter:

In assignPlayersToNearestCluster() you can find arsort($abstaende);. After that, the function each() takes the first value. But it's arsort so the first value must be the highest. So it picks the cluster which has the highest distance value.

So it should be asort, of course. :) To prove that, I've tested it with asort - and I get convergence after 7 iterations. :)

Do you think that was the mistake? If it was, then my problem is solved. In that case: Sorry for annoying you with that stupid question. ;)

Questions, may it be stupid or not, when answered correctly, makes us wiser.
Randell
Yes, you're right. :) Here you can learn that you shouldn't mix up arsort() and asort() AND you find general k-means code for PHP. :)