Hi,
I have a set of training data consisting of 20 multiple choice questions (A/B/C/D) answered by a hundred respondents. The answers are purely categorical and cannot be scaled to numerical values. 50 of these respondents were selected for free product trial. The selection process is not known. What interesting knowledge can be mined from this information?
The following is a list of what I have come up with so far-
- A study of percentages (Example - Percentage of people who answered B on Qs.5 and got selected for free product trial)
- Conditional probabilities (Example - What is the probability that a person will get selected for free product trial given that he answered B on Qs.5)
- Naive Bayesian classifier (This can be used to predict whether a person will be selected or not for a given set of values for any subset of questions).
Can you think of any other interesting analysis or data-mining activities that can be performed?
The usual suspects like correlation can be eliminated as the response is not quantifiable/scoreable.
Is my approach correct?