I have need to do some cluster analysis on a set of 2 dimensional data (I may add extra dimensions along the way).
The analysis itself will form part of the data being fed into a visualisation, rather than the inputs into another process (e.g. Radial Basis Function Networks).
To this end, I'd like to find a set of clusters which primarily "looks right", rather than elucidating some hidden patterns.
My intuition is that k-means would be a good starting place for this, but that finding the right number of clusters to run the algorithm with would be problematic.
The problem I'm coming to is this:
How to determine the 'best' value for k such that the clusters formed are stable and visually verifiable?
Questions:
- Assuming that this isn't NP-complete, what is the time complexity for finding a good k. (probably reported in number of times to run the k-means algorithm).
- is k-means a good starting point for this type of problem? If so, what other approaches would you recommend. A specific example, backed by an anecdote/experience would be maxi-bon.
- what short cuts/approximations would you recommend to increase the performance.