views:

158

answers:

3

I've wasted my entire day trying to use the minimax algorithm to make an unbeatable tictactoe AI. I missed something along the way (brain fried).

I'm not looking for code here, just a better explanation of where I went wrong.

Here is my current code (the minimax method always returns 0 for some reason):

from copy import deepcopy


class Square(object):
    def __init__(self, player=None):
        self.player = player

    @property
    def empty(self):
        return self.player is None


class Board(object):
    winning_combos = (
        [0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 3, 6], [1, 4, 7], [2, 5, 8],
        [0, 4, 8], [2, 4, 6],
    )

    def __init__(self, squares={}):
        self.squares = squares
        for i in range(9):
            if self.squares.get(i) is None:
                self.squares[i] = Square()

    @property
    def available_moves(self):
        return [k for k, v in self.squares.iteritems() if v.empty]

    @property
    def complete(self):
        for combo in self.winning_combos:
            combo_available = True
            for pos in combo:
                if not pos in self.available_moves:
                    combo_available = False
            if combo_available:
                return self.winner is not None
        return True

    @property
    def player_won(self):
        return self.winner == 'X'

    @property
    def computer_won(self):
        return self.winner == 'O'

    @property
    def tied(self):
        return self.complete == True and self.winner is None

    @property
    def winner(self):
        for player in ('X', 'O'):
            positions = self.get_squares(player)
            for combo in self.winning_combos:
                win = True
                for pos in combo:
                    if pos not in positions:
                        win = False
                if win:
                    return player
        return None

    @property
    def heuristic(self):
        if self.player_won:
            return -1
        elif self.tied:
            return 0
        elif self.computer_won:
            return 1

    def get_squares(self, player):
        return [k for k,v in self.squares.iteritems() if v.player == player]

    def make_move(self, position, player):
        self.squares[position] = Square(player)

    def minimax(self, node, player):
        if node.complete:
            return node.heuristic
        a = -1e10000
        for move in node.available_moves:
            child = deepcopy(node)
            child.make_move(move, player)
            a = max([a, -self.minimax(child, get_enemy(player))])
        return a


def get_enemy(player):
    if player == 'X':
        return 'O'
    return 'X'
+1  A: 

As you already know the idea of Minimax is to deep search for the best value, assuming the opponent will always play the move with the worst value (worst for us, so best for them).

The idea is, you will try to give a value to each position. The position where you lose is negative (we don't want that) and the position where you win is positive. You assume you will always try for the highest-value position, but you also assume the opponent will always aim at the lowest-value position, which has the worst outcome for us, and the best for them (they win, we lose). So you put yourself in their shoes, try to play as good as you can as them, and assume they will do that.
So if you find out you have possible two moves, one giving them the choice to win or to lose, one resulting in a draw anyway, you assume they will go for the move that will have them win if you let them do that. So it's better to go for the draw.

Now for a more "algorithmic" view.

Imagine your grid is nearly full except for two possible positions.
Consider what happens when you play the first one :
The opponent will play the other one. It's their only possible move so we don't have to consider other moves from them. Look at the result, associate a resulting value (+∞ if won, 0 if draw, -∞ if lost : for tic tac toe you can represent those as +1 0 and -1).
Now consider what happens when you play the second one :
(same thing here, opponent has only one move, look at the resulting position, value the position).

You need to choose between the two moves. It's our move, so we want the best result (this is the "max" in minimax). Choose the one with the higher result as our "best" move. That's it for the "2 moves from end" example.

Now imagine you have not 2 but 3 moves left.
The principle is the same, you want to assign a value to each of your 3 possible moves, so that you can choose the best.
So you start by considering one of the three moves.
You are now in the situation above, with only 2 possible moves, but it's the opponent's turn. Then you start considering one of the possible moves for the opponent, like we did above. Likewise, you look at each of the possible moves, and you find an outcome value for both of them. It's the opponent move, so we assume they will play the "best" move for them, the one with the worst turnout for us, so it's the one with the lesser value (this is the "min" in minimax). Ignore the other one ; assume they will play what you found was best for them anyway. This is what your move will yield, so it's the value you assign to the first of your three moves.

Now you consider each of your other possible 2 moves. You give them a value in the same manner. And from your three moves, you choose the one with the max value.

Now consider what happens with 4 moves. For each of your 4 moves, you look what happens for the 3 moves of your opponent, and for each of them you assume they will choose the one that gives you the worst possible outcome of the best of the 2 remaining moves for you.

You see where this is headed. To evaluate a move n steps from the end, you look at what may happen for each of the n possible moves, trying to give them a value so that you can pick the best. In the process, you will have to try to find the best move for the player that plays at n-1 : the opponent, and choose the step with the lesser value. In the process of evaluating the n-1 move, you have to choose between the possible n-2 moves, which will be ours, and assume we will play as well as we can at this step. Etc.

This is why this algorithm is inherently recursive. Whatever n, at step n you evaluate all possible steps at n-1. Rinse and repeat.

For tic-tac-toe todays machines are far powerful enough to compute all possible outcomes right off from the start of the game, because there are only a few hundred of them. When you look to implement it for a more complex game, you will have to stop computing at some point because it will take too long. So for a complex game, you will also have to write code that decides whether to continue looking for all possible next moves or to try to give a value to the position now and return early. It means you will also have to compute a value for position that is not final - for example for chess you would take into account how much material each opponent has on the board, the immediate possibilities of check without mate, how many tiles you control and all, which makes it not trivial.

Jean
@Jean - see my latest code update (it's still broken). Is there anything you can glean from that which might help me?
orokusaki
+2  A: 

Step 1: Build your game tree

Starting from the current board generate all possible moves your opponent can make. Then for each of those generate all the possible moves you can make. For Tic-Tac-Toe simply continue until no one can play. In other games you'll generally stop after a given time or depth.

This looks like a tree, draw it yourself on a piece of paper, current board at top, all opponent moves one layer below, all your possible moves in response one layer below etc.

Step 2: Score all boards at the bottom of the tree

For a simple game like Tic-Tac-Toe make the score 0 if you lose, 50 tie, 100 win.

Step 3: Propagate the score up the tree

This is where the min-max come into play. The score of a previously unscored board depends on its children and who gets to play. If it is the opponents play you assume he always makes the worst possible move, so you choose the minimum of all the children. If it is your turn you assume you'll make the best possible move, so you choose the maximum.

Step 4: Pick your best move

Now play the move that results in the best propagated score among all your possible plays from the current position.

Try it on a piece of paper, if starting from a blank board is too much for you start from some advanced Tic-Tac-Toe position.

Using recursion: Very often this can be simplified by using recursion. The "scoring" function is called recursively at each depth and depending on whether or not the depth is odd or even it select max or min respectively for all possible moves. When no moves are possible it evaluates the static score of the board. Recursive solutions (e.g. the example code) can be a bit trickier to grasp.

Guy
@Guy - see my code update - is there anything in my minimax method that stands out as an obvious idiot mistake (it's broken and only returns 0 as of now)?
orokusaki
I would not do the negating trick in minimax myself. Do you know you're applying min and max at the right levels? Which function are you calling? We're missing the actual game play code I think... To debug this I'd recommend starting from a non-blank fairly deep position and follow the execution while matching to the paper version...
Guy
+1  A: 

Your complete function is not working as expected, causing games to be declared tied before anything can happen. For instance, consider this setup:

>> oWinning = {
 1: Square('X'),
 3: Square('O'), 4: Square('X'),
 6: Square('O'), 8: Square('X'),
}
>> nb = Board(oWinning)
>> nb.complete
True
>> nb.tied
True

This should be a win for the computer on the next move. Instead, it says the game is tied.

The problem is that your logic in complete, right now, checks to see if all of the squares in a combo are free. If any of them are not, it presumes that that combo can't be won with. What it needs to do is check if any positions in that combo are occupied, and so long as all of those combos are either None or the same player, that combo should be considered still available.

e.g.

def available_combos(self, player):
    return self.available_moves + self.get_squares(player)

@property
def complete(self):
    for player in ('X', 'O'):
        for combo in self.winning_combos:
            combo_available = True
            for pos in combo:
                if not pos in self.available_combos(player):
                   combo_available = False
            if combo_available:
                return self.winner is not None
    return True

Now that I properly tested this with the updated code I'm getting the expected result on this test case:

>>> nb.minimax(nb, 'O')
-1
>>> nb.minimax(nb, 'X')
1
Conspicuous Compiler
@Conspicuous - What an idiot mistake on my part. Thanks so much. BTW, 1 question: How does the minimax know what the best node is going down the tree when we're only using 3 values, and they're not additive on the way up (e.g. +1 wins at the leaf level, but at the next node up from leaf, it would compare against potentially other +1s right?
orokusaki
@Conspicuous - also, this thing now takes about 2 minutes to complete. Would you implement alpha beta pruning to this?
orokusaki
@orokusaki: Frankly, I'd remove the Square() object altogether, change the Board to store a string 9 characters long, with a blank indicating an untaken move, memoize the heuristic of board positions, and before recursing, check whether there was a memoized heuristic for the board as it is, and the board rotated 90, 180, and 270 degrees. That will complicated the code a fair bit, granted.
Conspicuous Compiler
@orokusaki: (Oh, to clarify, any Tic-Tac Toe game rotated a multiple of 90 degrees is essentially identical for scoring purposes.)
Conspicuous Compiler
Oh, sorry, overlooked the first question. It doesn't matter how many possible paths to victory a given node presents. This is because we presume both players are using minimax, so if a tree represents a more desirable route for us, our opponent will avoid it anyway. In a zero-sum game like this, its provable that an opponent who doesn't use minimax will wind up worse off than if he did so long as we use minimax, making it the optimal strategy.
Conspicuous Compiler