views:

187

answers:

5

I decided to write a small program that solves TicTacToe in order to try out the effect of some pruning techniques on a trivial game. The full game tree using minimax to solve it only ends up with 549,946 possible games. With alpha-beta pruning, the number of states required to evaluate was reduced to 18,297. Then I applied a transposition table that brings the number down to 2,592. Now I want to see how low that number can go.

The next enhancement I want to apply is a strategic reduction. The basic idea is to combine states that have equivalent strategic value. For instance, on the first move, if X plays first, there is nothing strategically different (assuming your opponent plays optimally) about choosing one corner instead of another. In the same situation, the same is true of the center of the walls of the board, and the center is also significant. By reducing to significant states only, you end up with only 3 states for evaluation on the first move instead of 9. This technique should be very useful since it prunes states near the top of the game tree. This idea came from the GameShrink method created by a group at CMU, only I am trying to avoid writing the general form, and just doing what is needed to apply the technique to TicTacToe.

In order to achieve this, I modified my hash function (for the transposition table) to enumerate all strategically equivalent positions (using rotation and flipping functions), and to only return the lowest of the values for each board. Unfortunately now my program thinks X can force a win in 5 moves from an empty board when going first. After a long debugging session, it became apparent to me the program was always returning the move for the lowest strategically significant move (I store the last move in the transposition table as part of my state). Is there a better way I can go about adding this feature, or a simple method for determining the correct move applicable to the current situation with what I have already done?

A: 

Why do you need to make the transposition table mutable? The best move does not depend on the history.

namin
+1  A: 

You need to return the (reverse) transposition along with the lowest value position. That way you can apply the reverse transposition to the prospective moves in order to get the next position.

Chris Dodd
This is a technique I have found very valuable in the past, particularly for very complex games like when I was working on poker, but it felt like it would be more effort than it was worth for this simple project. I have never written a general form state mapper either. Do you know of any? Might be a good project to take on sometime.
NickLarsen
+2  A: 

You're on the right track when you're thinking about reflections and rotations. However, you're applying it to the wrong place. Don't add it to your transposition table or your transposition table code -- put it inside the move generation function, to eliminate logically equivalent states from the get-go.

Keep your transposition table and associated code as small and as efficient as possible.

Shaggy Frog
This helped a lot. Funny enough though, as soon as I did this, I ran into all kinds of problems. Turns out I was focusing on solving the game up front, and attempting to play from previously evaluated positions, which sometimes had not been evaluated. Once I modified it to just re run the search for each move, all the problems went away. Now I am on to move ordering techniques. Do you know off the top of your head of any good move ordering publications?
NickLarsen
Not offhand as move ordering is usually heavily tied in with the domain. Near the top of the search tree it can be quite important but near the bottom it might cost more time than it saves in node expansions. Consider only doing it at nodes (say) >=3 ply from the bottom. One cheap and easy thing to do in general is to do a depth-1 search for each node and then sort based on that, which helps you leverage your existing evaluation function.
Shaggy Frog
+4  A: 

My gut feeling is that you are using too big of a hammer to attack this problem. Each of the 9 spots can only have one of two labels: X or O or empty. You have then at most 3^9 = 19,683 unique boards. Since there are 3 equivalent reflections for every board, you really only have 3^9 / 4 ~ 5k boards. You can reduce this by throwing out invalid boards (if they have a row of X's AND a row of O's simultaneously).

So with a compact representation, you would need less than 10kb of memory to enumerate everything. I would evaluate and store the entire game graph in memory.

We can label every single board with its true minimax value, by computing the minimax values bottom up instead of top down (as in your tree search method). Here's a general outline: We compute the minimax values for all unique boards and label them all first, before the game starts. To make the minimax move, you simply look at the boards succeeding your current state, and pick the move with the best minimax value.

Here's how to perform the initial labeling. Generate all valid unique boards, throwing out reflections. Now we start labeling the boards with the most moves (9), and iterating down to the boards with least moves (0). Label any endgame boards with wins, losses, and draws. For any non-endgame boards where it's X's turn to move: 1) if there exists a successor board that's a win for X, label this board a win; 2) if in successor boards there are no wins but there exists a draw, then label this board a draw; 3) if in successor boards there are no wins and no draws then label this board a loss. The logic is similar when labeling for O's turn.

As far as implementation goes, because of the small size of the state space I would code the "if there exists" logic just as a simple loop over all 5k states. But if you really wanted to tweak this for asymptotic running time, you would construct a directed graph of which board states lead to which other board states, and perform the minimax labeling by traversing in the reverse direction of the edges.

Eric
I'd vote for this as the answer, but you're missing the point of the exercise which is to use a specific hammer.Still, you're right in that there aren't 549,946 possible games of TicTacToe (even including unselected, there isn't that many possible *states*, let alone games). The idea is interesting, but the poster needs to work out the details of what can happen a bit more first. Start with the 512 possible ending states for a 3x3 grid, eliminate the impossible, and *then* work on equivalent moves and states.
jmoreno
Thanks, maybe I can work out more details to a quick algorithm. I understand your point about the exercise, but I would suggest that there is little value in using a technique that poorly fits a problem for that particular problem, especially when alternatives are not only cheaper, simpler, and faster. Using a technique that doesn't fit a problem also may limit what you can learn about that technique (i.e. in the real world if you would have always used an alternative, what reusable knowledge/experience would you have gained from the exercise?)
Eric
I really appreciate this answer and I have a few questions, and a few points. If you include move history in your state, as basic minimax implicitly does, there are exactly 549,946 possible reachable states which terminate at the moment the board fills or either player makes 3 in a row, and far more if you allow the game to progress beyond finding a winner on a non empty board, though I did not. I do agree the extra work is worthless however, and clearly basic minimax requires an enormous effort for even trivial games, but that does not make it bad for exercise.
NickLarsen
The use of a transposition table should reduce the tree size to only the number of states possibly reachable in a game regardless of history, which is great because history does not affect the available moves in tic tac toe and therefore has no effect on the optimal move. Using only a transposition table my small program shows 5478 reachable states in tic tac toe, which I think is correct for my rules.
NickLarsen
Since board spots can be blank, the possible number of states no matter what is 3^9 = 19683. Realizing that valid, reachable states have either the same number of X's as O's or 1 more X than O's, and many states are not reachable due to my rules, 5478 seems reasonable. 2^9 only represents the leaf nodes for filling every position on the board in no particular order.
NickLarsen
I have some questions about your algorithm but they will have to wait until after dinner.
NickLarsen
You're right -- I completely forgot about incomplete games. So with X, O or blank, we start off with 3^9 = 19,683 as you mentioned. For every one of these 19,683 boards, there are three other redundant reflections of it, so we really have 3^9 / 4 = 4,920. Any additional rules you've got would cut this down even further.Minimax is just a rule of how to propagate the value of states several moves ahead back up to the present. In general it isn't committed to tree search vs. graph search. The algorithm I outlined interprets the game as a graph, while your approach interprets it as a tree.
Eric
The use of tree search in this game blows up quickly because there are an exponential number of paths that reach an identical state in the game tree, and the transposition table can help reduce that. But if you search the game graph right from the start, you avoid all those exponential paths in the first place. Most interesting games have graphs too large to fit in memory, so you would search the tree instead, but not this one.
Eric
Assuming you must use a tree search algorithm, the next improvement I would add to your program is to just include a cache that maps states to minimax values. The cache will never grow beyond memory limits so don't worry about evicting old entries. During search, check the current state against the cache. On a cache-hit, return immediately the previously computed minimax value in the cache. On a cache-miss, proceed normally with your algorithm, and when it returns from searching that subtree of this state, insert the newly computed minimax value along with the state into your cache.
Eric
Incidentally, the minor change here would be that instead of caching a really large set of move sequences (as your transposition table does), we're caching the much smaller set of states instead.
Eric
I think you misunderstood how I am using a transposition table. I do not not take into account move history when calculating the position hash, and therefore do not store move history as part of my state. The cache you suggest describes my transposition table. As part of my state, I store the best follow up move to this position so I do not have to check the minimax values for all subsequent states of the current position, but not the path leading to a particular state.
NickLarsen
If I am not mistaken, a transposition table which only takes relevant move history into account (think en passant for chess) exactly reduces the search to a graph, hence its power.
NickLarsen
If I understand your algorithm correctly, first I should enumerate all of the possible reachable positions, calculate and store their minimax values in a state graph and traverse the graph as the game is being played. If that is correct, can you explain how that is different than using a minimax search and storing evaluated states in a transposition table? If not, can you explain your solution in a little more detail?
NickLarsen
Your calculation for the number of states, as already mentioned, is off by a factor of magnitude, which scuttles the rest of your answer.
Shaggy Frog
@NickLarsen I did misunderstand your use of a transposition table. I think you are in fact doing the same thing as I had described with a cache, so there is no difference. My original answer was an alternative method of computing the same result without using tree search.
Eric
If you allow your transposition table to store all nodes down to the leaves, you don't have to do any search after each move as the game progresses. You can just use your transposition table as a lookup table after the initial computation. You can't typically allow your transposition table to store the entire tree but you can here since the state space will just be about 5k nodes (don't try this Shaggy because it looks like this order of magnitude difference exceeds your computer's memory capacity).
Eric
@Eric You are describing solving the game. That's fine but it doesn't really answer the OP's question, nor will it help other users with similar questions (using rotations and reflections to reduce the branching factor), especially if those users have non-trivial domains (it took my former supervisor years to solve checkers using your approach). Meanwhile your original answer still contains the obvious error calculating the size of the state space. Finally, your snark about memory is unhelpful.
Shaggy Frog
@Shaggy I've revised the answer as you have suggested to correct the state-space estimate. As your other comments are not about helping Nick but rather directed personally at me, I won't be responding to them. Thanks.
Eric
@Eric If that's your policy you should reconsider pitching first-strike personal comments of your own. Just saying.
Shaggy Frog
A: 

There is a lot that can be said about this, but I will just give one tip here which will reduce your tree size: Matt Ginsberg developed a method called Partition Search which does equivalency reductions on the board. It worked well in Bridge, and he uses tic-tac-toe as an example.

Nathan S.
Thanks for the reading link. I'll take a look soon.
NickLarsen