views:

322

answers:

4

Does anyone know a good algorithm for perform clustering on both discrete and continuous attributes? I am working on a problem of identifying a group of similar customers and each customer has both discrete and continuous attributes (Think type of customers, amount of revenue generated by this customer, geographic location and etc..)

Traditionally algorithm like K-means or EM work for continuous attributes, what if we have a mix of continuous and discrete attributes?

+3  A: 

If I remember correctly, then COBWEB algorithm could work with discrete attributes.

And you can also do different 'tricks' to the discrete attributes in order to create meaningful distance metrics.

You could google for clustering of categorical/discrete attributes, one of the first hits: ROCK: A Robust Clustering Algorithm for Categorical Attributes.

Anonymous
A: 

You could also look at affinity propagation as a possible solution. But to overcome the continuous / discrete dilemma you need to define a function that values the discrete states.

nasmorn
A: 

I would actually present pairs of the discrete attributes to users and ask them to define their proximity. You would present them with a scale reaching from [synonym..very foreign] or similar. Having many people do this you will end up with a widely accepted proximity function for the non-linear attribute values.

Ralph Rickenbach
+1  A: 

R is a great tool for clustering - the standard approach would be to calculate a dissimilarity matrix on your mixed data using daisy, then clustering with that matrix using agnes.

The cba module on CRAN includes a function to cluster on binary predictors based on ROCK.

bubaker