Reading about how Google solves the translation problem got me thinking. Would it be possible to build a strong chess engine by analysing several million games and determining the best possible move based largely (completely?) on statistics? There are several such chess databases (this is one that has 4.5 million games), and one could potentially weight moves in identical (or mirrored or reflected) positions using factors such as the ratings of the players involved, how old the game is (to factor in improvements in chess theory) etc. Any reasons why this wouldn't be a feasible approach to building a chess engine?
Well, 4.5 million games still only covers a very tiny (infinitesimal small) fraction of all possible games.
And while you would have a large number of winning and loosing positions, that would leave the problem of reducing that to a usable set of parameters. A very old problem, with neural networks as a standard approach. But NeuralNets aren't winning chess tournaments.
I like the idea, but the analogy [with text translation] seems to fall short when considering that the context of a sentence in natural language requires much fewer elements than the context of a chess board position (even though the elements of such sentences, namely words, may come from of a bigger set than the elements of a chess game, namely the game pieces, knight, pawn etc.)
Furthermore, the availability of multi-lingual corpus (documents of various nature, in various languages) far exceeds the number of chess games which one may find in a digital form, in particular when considering that for chess analysis one needs the whole game whereby, for translation purpose, one can use every sentence independently from the rest of the text.
As a result, and with the possible exception of the opening portion of the games (when the board positions haven't had much opportunity to diverge relatively to other games), the number of chess games required to introduce some statistical significance would have to be astronomical...
Gotta run, but I'll be back with specific estimates on the number of possible chess games (in the absolute, and in the subset of plausible games), and should effectively prove that 4.5 million games is a relatively small sample.
There are approximately 10123 game trees in chess out of which you have about 4.5 × 106 in that database. We can ignore the game trees, and only consider state-space complexity of which there are anywhere between 1043 and 1050 legal states. Let's assume that all games in that database have unique moves, and that there are 1000 moves on average per game, which gives us 4.5 × 109 states. Taking the estimated lower bound 1043 of possible states, that only covers 4.5 × 10-34 of all states. I don't know what is the total number of unique board positions that exclude rotations or reflections, but it will only reduce it by a factor of two or so, which is not very helpful.
You would need to add more domain knowledge into the statistical engine and find out the level of similarity between two given board positions as there is a 1 in 1035 chance that you will not find a matching board position (including reflections and rotations). I think the biggest key here would be finding how how two given board positions are similar. That will incorporate a lot more domain knowledge than just simple transformations.
It is a great idea nonetheless that is worth exploring further, although I suspect that it has been tried before given the complexity of chess and the interest around it.
Something like this is already done: it's the underlying concept of opening books.
Due to the nature of the game, computer AIs are notoriously bad in the beginning, when there are so many possibilities and the end goal is still far ahead. It starts to improve toward the middle, when tactical possibilities start to form, and can play perfectly in the end game far exceeding the capability of most humans.
To help the AI make good moves in the beginning, many engines rely on opening books instead: a statistically derived flowchart of moves, basically. Many games between highly rated players were analyzed, and recommendations are hard-coded into "the book", and while the positions are still in "the book", the AI doesn't even "think", and simply follow what "the book" says.
Some people can also memorize opening books (this is mostly why Fischer invented his random chess variant, so that memorization of openings becomes far less effective). Partly due to this, sometimes an unconventional move is made in the beginning, not because it's statistically the best move according to history, but precisely the opposite: it's not a "known" position, and can take your opponent (human or computer) "out of the book".
On the opposite end of the spectrum, there is something called endgame tablebase, which is basically a database of previously analyzed endgame positions. Since the positions were previously searched exhaustively, one can use this to enable perfect play: given any position, one can immediately decide if it's winning, losing, or draw, and what's the best way to achieve/avoid the outcome.
In chess, something like this is only feasible for the opening and endgame, though. The complexity of the middle game is what makes the game interesting. If one can play chess simply by looking up a table, then the game wouldn't be as exciting, interesting, and deep as it is.
This general strategy has been tried for a variety of games. Very often people generate a suitably large database of games by having the computer play itself. A quick internet search turns up http://www.cs.princeton.edu/courses/archive/fall06/cos402/papers/chess-RL.pdf - which builds on previous work in Backgammon. In chess, brute force look-ahead is very effective for computers, though - and in general statistics is much more effective when you can mix in all of the previously known information about the problem, rather than trying to re-learn it from the data. I note that in this link the computer learnt what amounts to the evaluation function at the bottom of the look-ahead, not the whole process.
There is something similar that works very well in computer Go - the UCT method. It does not use a known set of games but instead plays a huge number of random games while keeping statistics which moves lead to higher ratios of wins. It does so starting from the current position.
The statistics are kept in a tree of moves (similar to one used in minimax) and influence the choice of the next random game to play - the moves with higher win ratios are chosen more often. The growth of the tree is also guided by the games - usually each game adds a leaf to the tree. This leads to the promising paths being explored more deeply.