views:

91

answers:

4

A question about 20 questions games was asked here:

However, if I'm understanding it correctly, the answers seem to assume that each question will go down a hierarchal branching tree. A binary tree should work if the game went like this:

  1. Is it an animal? Yes.
  2. Is it a mammal? Yes.
  3. Is it a feline? Yes.

Because feline is an example of a mammal and mammal is an example of an animal. But what if the questions go like this?

  1. Is it a mammal? Yes.
  2. Is it a predator? Yes.
  3. Does it have a long nose? No.

You can't branch down a tree with those kinds of questions, because there are plenty of predators that aren't mammals. So you can't have your program just narrow it down to mammal and have predators be a subset of mammals.

So is there a way to use a binary search tree that I'm not understanding or is there a different algorithm for this problem?

Just to clarify, I'm only using 20 questions as an example, so my question is about this kind of search problem in general, not other problems involved specifically in a 20 questions game.

+2  A: 

It's likened to a binary search in that each question is yes/no, and so every answer partitions your remaining set into two parts. However, the data set would likely not be stored in an actual binary tree, because as you realize, that'd only work if the questions were always asked in the same order as the tree split dimension.

Also, you could easily have more than exactly 20 dimensions ('properties') on which to split things, and some set of those twenty could be shared by more than one object (so the leaf node of a 20-level binary tree wouldn't necessarily contain just one item).

Thus, the "binary search" is just a metaphor for what's actually going on, in that at each step you try to pick the property which best splits your remaining set into two equal halves. As far as actual data structures go, you'd have to use something else.

tzaman
A: 

If you needed to stick with a binary tree for the problem, there's nothing saying that you can't duplicate a branch or a node. Place the feline answer node at the end of more than one set of decisions. Or ask the predator question twice - once if the user said "yes" to mammal, and once if the user said "no".

Certainly there are optimization and efficiency concerns if you take this tack, but there are ways of addressing specific concerns as well. (For example, if you're worried about storage space for the decision tree, then make the branches or the nodes or both pointers to immutable objects/declarations).

Greg Harman
Won't the tree grow exponentially that way and get huge really fast? I'm just a beginner, but it seems like it would easier to just iterate over every single possible answer and check them one at a time than do that.
lala
A: 

If you are looking for an exact match - just hash on all the properties and do a lookup.

If you want to do pattern recognition to find similar items you can use a method with a quite 'linear' mapping - like k-nearest neighbour. You can for instance use a kd-tree to represent the search space.

disown
"Hash on"? Is that programmer's slang for something?
lala
"Hash on" == put the values in a hash table.
tzaman
A: 

I believe what you're looking for is more commonly referred to as a Decision Tree, specifically for classification. You can then use algorithms like C4.5 to learn how to order your questions to classify efficiently.

Brad