views:

113

answers:

2

I have a set of ~10K objects, each with approximately 150 distinct properties, about a quarter of which are multivalued and/or related to other properties.

I have a set of about 120 categories that I would like to sort these objects into, with each category being defined as a 'template' object. If an instance matches the template exactly, that object would clearly be within that category...however, only about 10% of the objects actually have a template that is an exact match. As a result, i would like to be able to score objects based on their similarity to categories, and sort them into their best match. I'd also like to identify clusters of objects that are very similar, indicating the potential for a new/refined category.

This seems like a job for Weka, RapidMiner or other machine learning/clustering/classification system. However, I'm having difficulty finding good introductory materials into this domain, and as a result can't tell how much effort it would take to use these tools in this case. Given that this could be an ongoing need, i would like to use something that will allow me to easily change the analysis method, weights, etc.

Thoughts?

+1  A: 

Lets talk..
If your remit is to catagorise these objects then you will be able to manually go mad with them!

I am dissecting a similar dataset, but always come back to the same point... these objects are ~basically~ the same.

the fuzzy logic that separates them is the holy grail.. but the holy grail is fuzzy...:(
what can you do?... give your boss some fuzzy formulae? that will last a while..

you can spend a lifetime trying to find patterns, but that might get you f all - why not try shift the perspective to something you can quantify? concentrate on outputs..

divinci
A: 

RapidMiner comes with an integrated online tutorial. Just start RapidMiner and then go to "Help" and then "RapidMiner Tutorial". Yon can also download a free PDF RapidMiner Tutorial from the Rapid-I web page. There also is a free small introductory RapidMiner video on the Rapid-I web page and there are many RapidMiner training courses, if you check the services section of the Rapid-I web page.