views:

63

answers:

2

Consider the following scenario:

  1. Potential customers are presented with a questionnaire where they can select none, one or multiple answers for every question.
  2. An automated algorithm should recommend the optimal solution based on the customers' answers.

Example:

  1. There are 3 possible solutions S1, S2, S3
  2. The questionnaire contains 10 questions Q1, Q2…Q10
  3. Each question contains a variable number of possible answers where:
    • A1.1 is the first answer for question 1.
    • A3.2 is the second answer for question 3.
  4. I want to be able to model the following solutions based on the answers provided by the customer:
    • A1.1, A1.3, A2.1, A3.2 => S1
    • A1.1, A1.3, A2.2 => S1
    • A1.2 => S2
    • A2.2 => S2
    • A1.1, A3.1, A3.2 => S2
    • Any other combination => S3

In summary:

  • For given set of answers a solution must be recommended.
  • Solutions defined by the smaller number of answers should be preferred over the ones defined by larger number of answers.

I’m looking for an existing algorithm (and data model) for the problem presented above instead of attempting to write my own from scratch.

+1  A: 

The simplest thing to try that just might work is a nearest-neighbor algorithm: compute the similarity between a new set of answers and every set of answers with a known solution (weighting by total number of answers, if that's what you want), and offer the most-frequently-chosen known solution from the set of equally close set of answers.

If that doesn't perform well, then you want a more sophisticated classifier of some sort. You should look up decision trees (and their extensions, alternating decision trees and random forests) and Bayesian classifiers, among others.

You can find code for some of these things in machine learning or neural network toolboxes. Since you didn't specify a language, I can't point to one, but the algorithms (not code) are described in various books like The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman.

Rex Kerr
A: 

To me it seems more as a declarative logical program then a combinatorial or statistical problem. Just reverse your statements about which solution to choose from given answers and replace "=>" with ":-" and you get Prolog. These statements are Horner clauses and can be solved using SLD resolution algorithm given your rules are simple. There are many of the shelf solvers with bindings to different languages, so you can choose some of them.

Gabriel Ščerbák