Recently I was faced with this interview question (K-Means Clustering solution). The design I came up with did not meet the expectations of the interviewer (to put simply I didnt get the job because I lost to another candidate on this design problem). I am wondering how many different / efficient / simply solutions can the SO community come up with (by doing this I am hoping to hone my skills):
To implement a simple algorithm to cluster people according to their weight and height. The data set includes a list of people with their weights and heights like so:
Person Weight Height
(kg) (inches)
Person 1 70 70
Person 2 75 80
Person 3 120 85
You can plot the data as a 2 dimensional data. Weight being one dimension and height being the other dimension. Weight can range from a minimum of 50kg to 150kg. Height can range from a minimum of 38inches to 90inches
Algorithm:
The algorithm (called K-means clustering) will cluster data into K groups goes as such:
Start with K clusters. Each cluster is defined by its center point which will start of as random weight and random height. Pick random numbers from within the corresponding ranges defined above.
For each person Calculate distance to center of each cluster using formula distance = sqrt(pow((wperson−wcenter), 2) + (pow(hperson−hcenter),2)) where wperson = weight of person, hperson = height of person wcenter = weight of cluster center point, hcenter = height of cluster center point
Assign the person to the cluster with the shortest distance to center point of cluster
After end of step 2, you will end up with K clusters each assigned with a set of people
For each cluster, set the weight and height of the center point to the average of the people in the cluster wcenter = (sum of weight of each person in cluster)/(number of people in cluster) hcenter = (sum of height of each person in cluster)/number of people in cluster)
Repeat steps 2 to 5 for 1000 iterations, then print out following information for each cluster.
weight and height of center of cluster. list of people in cluster.
I am not looking for a implementation/solution but for a high level design. can you list the interfaces / classes etc. I dont want to give my solution now, but will post it later in the day?