I have a set of ~10K objects, each with approximately 150 distinct properties, about a quarter of which are multivalued and/or related to other properties.
I have a set of about 120 categories that I would like to sort these objects into, with each category being defined as a 'template' object. If an instance matches the template exactly, that object would clearly be within that category...however, only about 10% of the objects actually have a template that is an exact match. As a result, i would like to be able to score objects based on their similarity to categories, and sort them into their best match. I'd also like to identify clusters of objects that are very similar, indicating the potential for a new/refined category.
This seems like a job for Weka, RapidMiner or other machine learning/clustering/classification system. However, I'm having difficulty finding good introductory materials into this domain, and as a result can't tell how much effort it would take to use these tools in this case. Given that this could be an ongoing need, i would like to use something that will allow me to easily change the analysis method, weights, etc.
Thoughts?