Hello,
I'm writing a Connect4 game with an AI opponent using adversarial search techniques and I have somewhat run into a wall. I feel that I'm not far from a solution but that there's perhaps a problem where I'm switching perspectives (as in: the perspective of which participant I'm basing my evaluation scores on), missing a minus sign somewhere or something like that.
The problem is either that in the variations that I've tried that the AI chooses not to block the player when the player has three-in-a-row but otherwise the AI plays a perfect game, or that he prefers to block the player, even if he has the chance to win the game. It also seems to matter whether or not the search depth is an even or an uneven number, as the AI is pants-on-head retarded at a 6-ply search, which is pretty telling that something is wrong.
Search
The algorithm used is negamax with alpha-beta pruning and is implemented as follows:
private int Negamax(int depth, int alpha, int beta, Player player)
{
Player winner;
if (Evaluator.IsLeafNode(game, out winner))
{
return winner == player ? (10000 / depth) : (-10000 / depth);
}
if (depth == Constants.RecursionDepth)
{
return Evaluator.Evaluate(game, depth, player);
}
foreach (var move in moves)
{
int row;
if (board.DoMove(move, player, out row))
{
var value = -Negamax(depth + 1, -beta, -alpha, (Player)1 - (int)player);
board.UndoMove(move, row, player);
if (value > alpha)
{
alpha = value;
if (player == Player.AI)
{
bestColumn = move;
}
}
if (alpha >= beta)
{
return alpha;
}
}
}
return alpha;
}
I don't suspect that the problem is in this function, but it might be.
Evaluation
I've based the evaluation function off of the fact that there are only 69 possible ways to get four-in-a-row on a 7x6 board. I have a look-up table of about 350 items that contains the hardcoded information for every column and row which win-combinations the row+column is a part of. For example, for row 0 and column 0, the table looks like this:
//c1r1
table[0][0] = new int[3];
table[0][0][0] = 21;
table[0][0][1] = 27;
table[0][0][2] = 61;
This means that column 0, row 0 is part of win-combination 21, 27 and 61.
I have a second table, that contains for both players how many stones it has in each of the win-combinations. When I do a move then I do the following:
public bool DoMove(int column, Player p, out int row)
{
row = moves[column];
if (row >= 0)
{
Cells[column + row * Constants.Columns] = p;
moves[column]--;
var combinations = this.Game.PlayerCombinations[p];
foreach (int i in TerminalPositionsTable.Get(column,row))
{
combinations[i]++;
}
return true;
}
else
{
return false;
}
}
The opposite is of course being done for UndoMove
.
So after doing a move on column 0, row 0 by Player.Human
, the table will be filled with a value of 1 at index 21, 27 and 61. If I do another move in a cell that is also part of win-combination 27, then the player combination table gets incremented at index 27 to 2.
I hope I have made that clear, as it's used in the evaluation function to very quickly determine how close a player is to scoring four-in-a-row.
The evaluation function, where I suspect the problem lies, is as follows:
public static int Evaluate(Game game, int depth, Player player)
{
var combinations = game.PlayerCombinations[player];
int score = 0;
for (int i = 0; i < combinations.Length; i++)
{
switch (combinations[i])
{
case 1:
score += 1;
break;
case 2:
score += 5;
break;
case 3:
score += 15;
break;
}
}
return score;
}
So I simply loop through the 69 possible win-combinations and add an amount to the score based on whether it's a single stone, two-in-a-row or three.
The part I'm still confused about in this whole adversarial search is whether I should care which player is doing a move? I mean, should I pass in the player like I do here, or should I always evaluate the board from the perspective of the AI player? I've tried many combinations of aiScore - humanScore
, or just always look from the perspective of Player.AI
, and such. But I've hit a dead end and every combination I've tried was pretty flawed.
So:
- Is the logic of my evaluation solid in its basis?
- When should I 'switch perspective'?
Any help would be much appreciated.
Update
I've implemented Brennan's suggestions below, and while it has definitely much improved, for some reason it doesn't block three-in-a-rows on any column but the two left and right-most, and only when the search-depth is uneven. The AI is unbeatable at even search depths, but only until depth 8 and up. Then it refuses to block again. This is pretty telling that I'm probably very close, but still have some crucial flaw.
Perhaps this has to do with me setting the column the AI should drop a stone in as Brennan commented, but I don't know when else to set it. Setting it only at depth 0 doesn't work.
Update 2
Edited the code as it looks like now with Brennan's changes.
Update 3
Created a Github repo with the full code. If you don't know how to work Git, just download a zip file from here.
It's a .NET 4.0 project, and running it will create log files of the negamax algorithm in your documents/logs directory. The solution also contains a test project, that contains a test for every board column whether or not the AI chooses to block the player when the player has three-in-a-row there.