views:

4904

answers:

5

Simple online games of 20 questions powered by an eerily accurate AI.

How do they guess so well?

+10  A: 

It is using a learning algorithm.

k-NN is a good example of one of these.

Wikipedia: k-Nearest Neighbor Algorithm

ShaunLMason
Is a nearest neighbour algorithm a good choice in this case? It would seem that it would be far too forgiving of wrong answers, and could end up with a massive number of dimensions, many of which with no data. (I'm assuming the use of hamming distance, and one dimension per question.) A decision tree seems a more natural fit.
Kylotan
+14  A: 

I recommend reading about the game here: http://en.wikipedia.org/wiki/Twenty_Questions

In particular the Computers section:

The game suggests that the information (as measured by Shannon's entropy statistic) required to identify an arbitrary object is about 20 bits. The game is often used as an example when teaching people about information theory. Mathematically, if each question is structured to eliminate half the objects, 20 questions will allow the questioner to distinguish between 220 or 1,048,576 subjects. Accordingly, the most effective strategy for Twenty Questions is to ask questions that will split the field of remaining possibilities roughly in half each time. The process is analogous to a binary search algorithm in computer science.

altCognito
That explains some of it. But when you consider incorrect answers and general ambiguity, it still seems not quite so straightforward.
Daddy Warbox
If you looked at the link you'll see that this is not really a yes/no question that can divide the field by half each time. While your answer is correct for 20 questions, I think that Shaun's answer is more accurate, a simple nearest-neighbor learning algorithm, and enough user input, allows for some very accurate results.
yx
Ah, true, they are similar, but definitely the nearest neighbor makes more sense.
altCognito
+13  A: 

You can think of it as the Binary Search Algorithm. In each iteration, we ask a question, which should eliminate roughly half of the possible word choices. If there are total of N words, then we can expect to get an answer after log2(N) questions.

With 20 question, we should optimally be able to find a word among 2^20 = 1 million words.

One easy way to eliminate outliers (wrong answers) would be to probably use something like RANSAC. This would mean, instead of takinging into account all questions which hae been answered, you randomly pick a smaller subset, which is enough to give you a single answer. Now you repeat that a few times with different random subset of questions, till you see that most of the time, you are getting the same result. you then know you have the right answer.

Ofcourse this is just one way of many ways of solving this problem.

Yogi
This simple program demonstrates what you are talking about rather well. Once you get there you can click on the `code` link to see it: http://openbookproject.net/py4fun/animal/animal.html
Noctis Skytower
+10  A: 

A decision tree supports this kind of application directly. Decision trees are commonly used in artificial intelligence.

A decision tree is a binary tree that asks "the best" question at each branch to distinguish between the collections represented by its left and right children. The best question is determined by some learning algorithm that the creators of the 20 questions application use to build the tree. Then, as other posters point out, a tree 20 levels deep gives you a million things.

A simple way to define "the best" question at each point is to look for a property that most evenly divides the collection into half. That way when you get a yes/no answer to that question, you get rid of about half of the collection at each step. This way you can approximate binary search.

Wikipedia gives a more complete example:

http://en.wikipedia.org/wiki/Decision_tree_learning

And some general background:

http://en.wikipedia.org/wiki/Decision_tree

Nathan Sanders
+1, I would note that was one of the comments in the Atwood article.
altCognito
True, although the BASIC program Animal doesn't have a training algorithm to determine which questions to use and how high in the tree to put them. Performance with a trained decision tree should be much better.(I agree with the commenter that the questions Atwood got look very much like they were generated by the original Animal algorithm and not by a neural network.)
Nathan Sanders
A: 

It bills itself as "the neural net on the internet", and therein lies the key. It likely stores the question/answer probabilities in a spare matrix. Using those probabilities, it's able to use a decision tree algorithm to deduce which question to ask that would best narrow down the next question. Once it narrows the number of possible answers to a few dozen, or if it's reached 20 questions already, then it starts reading off the most likely.

The really intriguing aspect of 20q.net is that unlike most decision tree and neural network algorithms I'm aware of, 20q supports a sparse matrix and incremental updates.

Chris S