ansaurus

Question

Design Problem

Answer 1

A:

I'm not sure what your question actually is, the steps you point out effectively define the algorithm you're talking about.

A better idea may be to include exactly what you did then people can give you some hints / tips as to where you might have gone wrong or what they would have done differently.

Robin Day 2009-05-27 13:32:25

Answer 2

+1 A:

Well, I would first tackle all the constants/magic numbers that reduce the reusability of the algorithm:

instead of a fixed number of iterations, use a stopping criterion (e.g., if clusters don't change too much, terminate)
don't restrict yourself to 2-dim data, use vectors
let the user define the number of clusters to be found

Then, you could hide some specifics behind interfaces, e.g. the distance might be calculated differently (for example, it might at some point have to cope with values other than double).

On the other hand, if you really have this simple problem, some of these generalizations might well be overkill - but that's what I would discuss with someone telling me to implement this algorithm.

__roland__ 2009-05-27 13:35:46

I dont u/s what u mean by dont restrict yourself to 2-dim data, use vectors?Agree with your logic to terminate when they converge and not go thru all the 1000 iterations.

newbee 2009-05-27 13:59:19

I just meant that you might run into a similar problem where you have to cluster users w.r.t. height, age, weight (= three dimensions), height, age, weight, net income etc. So why not generalize the algorithm to cope with n-dimensional vectors, i.e. arbitrarily many properties of users/entities?

__roland__ 2009-05-27 14:34:30

Answer 3

A:

That sounds like a really good way to do it. K-means will usually converge quickly (though not necessarily to the global optimum), so my one suggestion would be to run the algorithm until no more changes occur, rather than a fixed number of 1000 iterations. You could then repeat the entire process a few times with different random starting points.

One weakness of k-means is that it does require you to specify (i.e. guess) an appropriate value for k up-front. I think you would get points for asking the interviewer what an appropriate value for k would be, or, if there is no way to know, describing some goodness-of-fit measure and then calculating that measure for different values of k to find a "just low enough" value.

j_random_hacker 2009-05-27 13:36:48

The question I was posed was not to discuss the merits or demerits of k-means but simply to come up with a design.

newbee 2009-05-27 14:01:07

When proposing a design, you need to consider its merits and demerits. Don't you think?

j_random_hacker 2009-05-27 14:07:25

Answer 4

+1 A:

You can create the following classes:

Person to store data about persons and centers. Properties: id, weight and height. Method: calculateDistance
Cluster to store one center and a list of persons: Properties: center and list of Person. Method: calculateCenter.
KCluster to hold your algorithm and store a list of clusters: Property: list of Cluster. Methods: generateClusters.

fbinder 2009-05-27 13:37:48

This is very similar to what I came up with, except that I had a few more abstract classes / interfaces to make the solution extensible (that was one feedback I received to my intial design). I have to tell you that the design was termed sloppy!

newbee 2009-05-27 14:06:50

maybe you had too many!?

DrG 2009-05-27 14:38:39

It was explicity asked that you create an extensible design?

fbinder 2009-05-27 15:22:52

Answer 5

+2 A:

Chap 2009-05-27 14:43:39

ansaurus

tags:

views:

answers:

Design Problem

related questions