views:

195

answers:

7

Hi,

I'm generating a Tic-Tac-Toe game tree (9 seconds after the first move), and I'm told it should take only a few milliseconds. So I'm trying to optimize it, I ran it through CodeAnalyst and these are the top 5 calls being made (I used bitsets to represent the Tic-Tac-Toe board):

std::_Iterator_base::_Orphan_me

std::bitset<9>::test

std::_Iterator_base::_Adopt

std::bitset<9>::reference::operator bool

std::_Iterator_base::~_Iterator_base

void BuildTreeToDepth(Node &nNode, const int& nextPlayer, int depth)
{
   if (depth > 0)
   {
      //Calculate gameboard states
      int evalBoard = nNode.m_board.CalculateBoardState();
      bool isFinished = nNode.m_board.isFinished();
      if (isFinished || (nNode.m_board.isWinner() > 0))
      {
         nNode.m_winCount = evalBoard;
      }
      else
      {
         Ticboard tBoard = nNode.m_board;
         do
         {
            int validMove = tBoard.FirstValidMove();
            if (validMove != -1)
            {
               Node f;
               Ticboard tempBoard = nNode.m_board;
               tempBoard.Move(validMove, nextPlayer);
               tBoard.Move(validMove, nextPlayer);
               f.m_board = tempBoard;
               f.m_winCount = 0;
               f.m_Move = validMove;
               int currPlay = (nextPlayer == 1 ? 2 : 1);
               BuildTreeToDepth(f,currPlay, depth - 1);
               nNode.m_winCount += f.m_board.CalculateBoardState();
               nNode.m_branches.push_back(f);
            }
            else
            {
               break;
            }
         }while(true);
      }
   }
}

Where should I be looking to optimize it? How should I optimize these 5 calls (I don't recognize them=.

+1  A: 

You're getting all wrapped up in data structure. Don't build the tree, just walk it. Have only one copy of the board. At each node in the search tree, just modify the board, and on the way back out, un-modify it.

And if you want to know what it's doing, just hit the pause button at random. It will show you why it's in those routines you don't recognize that are taking all the time.

Mike Dunlavey
But I want to generate the whole tree, that's the thing. I'm just learning about tree structures so I'm doing this on purpose... but clearly not in an efficient way.
cam
@cam: OK, you want to make a tree structure. As I said, just use the "pause" button, and it will tell you why those routines are being called. Just look up the call stack, and you will see why, and decide for yourself if it is necessary or if there is a better way. (I'm trying to teach you to fish, not give you the answer.)
Mike Dunlavey
Exactly. The fact that "Node f;" occurs in this function means lots of object creation deletion, and copying. To me board state should be an unsigned int (2 bits per square) and all those function calls should be written explicitly inline, not as functions. Too much OOP for Tic Tac Toe. OTOH it looks like you want code that is independent of the particular game.
phkahler
@phkahler: Yeah, that's how I would approach it too. The other issue is I'm trying to get the OP to see how brainlessly-simple it is to find out what's making all those low-level functions get called. CodeAnalyst is just confusing him (or her).
Mike Dunlavey
@phkahler, your concept of inline is very wrong headed. Explicitly inlining wont gain you anything, the compiler is more than capable of translating functions to completely inlined code.
caspin
@Caspin so where exactly is the ridiculous amount of time being spent? Don't write a bunch of extra code and then assume the compiler will remove it for you - I think that's wrong headed.
phkahler
@Caspin, @phkahler: You guys having fun? To me, inlining is one of those things that are probably never worth debating.
Mike Dunlavey
+2  A: 

Those functions are typically trivial. That means that an optimized ("release") build will typically have them inlined. However, in a debug build they're not. The result is that a debug build is slower, but allows you to set breakpoints on those functions. So, the "milliseconds comment" should be applied to the release build, where you wouldn't even have those functions anymore.

MSalters
100MS vs. 9 seconds? Do I understand that you're suggesting that the debugging overhead will add nearly 900 times the processing effort?
San Jacinto
Right, I've built in Release. It's obviously a problem with my code. I'm just looking to see if it's in the algorithm part or not. If it helps, the Ticboard datastructure it three bitsets, and I haven't overridden the copy constructors or anything.
cam
@san Jacinto: Depending on the code and how much it relies on calling potentially inlined code a lot, yes, sometimes the tradeoff between debug and release can be gigantic.I had a case with millions of such calls, where a release build would calculate in minutes, and the debug build would take half a day.
MadKeithV
A debugging overhead that's 900 times? Certainly possible. 90 times (9000 ms/100ms) I have personally seen.
MSalters
A: 

Honestly, and I don't mean this as a slam against you, you're asking us to examine a poorly documented piece of code that is a smaller part to a larger code base. We don't have the context that gives much information. I personally am also turned off by examining others' code when it doesn't appear that they've done all they can do to examine it themselves yet (and I don't mean this to say I'm annoyed at you or anything, just that my eyes are glazing over looking at your code).

I recommend you run your code through a profiler and determine what exactly is taking so much time. Treat this profiling like you're debugging. When you find a module taking a long time, examine that module in small sections (as if you're hunting for a bug) to see why.

This will allow you to ask a much more informed question if you still need to ask something.

San Jacinto
Which I've done. I've Googled extensively on what CodeAnalyst brought up, I posted this on Stackoverflow to see as a last-resort to see if anyone can see anything that would inhibit performance. If not, then I can assume that generating the entire Tic-Tac-Toe game tree in memory is limited to 9 seconds and the person who said nano seconds was exaggerating. It doesn't need to be documented seeing as all the functions are pretty self explanatory, and it's a simple recursive function.
cam
@cam CalculateBoardState() is trivial? Move() is trivial? BuildTreeToDepth() is trivial? I don't think you're giving us the benefit of your code analysis. Nor is it trivially apparent what these functions do. What I'm saying is that you're not asking a very good question. If you don't want to put any more effort into it then ok, more power to you. I hope you get what you need (sincerely).
San Jacinto
A: 

Let me add that if your system is taking 9 seconds to do its work, that means that something is being called billions and billions of times more than it should. If you don't have release level profiling abilities, place a few global counters in your code and increment them every time the code they are in is called. This will give you a poor man's profile that will work on release builds. If you see a billions calls somewhere you don't expect, you now have a place to look closer.

In reality, Tic-Tac-Toe's entire move tree should be trivial to store as a hash if you need the speed. 9! moves just isn't that big of a domain (anymore). 362k moves shouldn't break the bank, and that's a brute force analysis. That domain can be cut way down when you take into consideration all the various symetries of data.

Bah, here's how I would do it if I was coding it since people have latched onto my back of the envelope math:

I wouldn't even go the tree route, just some rules and be done.
Turn 1. Go in center.
Turn 2. If center unoccupied, go there, else go in corner
Turn 3. If opponent filled a corner, go in opposite corner, else go in corner - you've won.
Turn 4. Block if needed. If Xs occupy opposite corners, fill edge. If Xs occupy center and opposite corner, fill corner. If Xs occupy opposite edges, fill corner and win. If Xs fill adjacent edges, Fill corner betweem them.
Turn 5 Win if possible. Block if needed. Else go in corner opposite of adjacent edge move and win.
Turn 6-9 Win if possible. Block if needed. Else, place random towards draw.

Michael Dorgan
not necessarily. admittedly tic-tac-toe is pretty simple but game trees can get quite large very quickly. Assuming an optimally stored game tree (which we'll never have), the number of bits required to store the chess game tree is more than number of molecules in our solar system. So 9 seconds might not be so bad.
caspin
@Caspin: Do any game-playing lookahead programs actually store game trees? Other than enough information to choose the next move, who's gonna read it? It's just a giant garbage generator.
Mike Dunlavey
I agree that we should have "poor man" methods of profiling, but I think you can do better than counters and/or timers. http://stackoverflow.com/questions/406760/whats-your-most-controversial-programming-opinion/1562802#1562802
Mike Dunlavey
No successful gameplaying algorithms store large portions of the game tree. Anything beyond connect 4 is infeasible. The checkers AI chinook stores more than most and it only stores games states with 8 pieces or less (ie each side has 4 pieces). A 9 piece database would be orders of magnitude larger than the 8 piece.
caspin
@Capsin: the total number of states in tic-tac-toe is obviously less than 3^9 = 19683. It is entirely feasible to store the entire game tree, including all transitions, in RAM on any modern PC.
MSalters
A: 

You've posted far too little of your code.

DeadMG
-1, This should have been a comment
Hasturkun
Ah, you're probably right.
DeadMG
+3  A: 

The tic-tac-toe game tree is very redundant. Eliminating rotated and mirrored boards will reduce the final ply of the game tree by 3 or 4 orders of magnitude. No amount of optimizations will make bubblesort as fast as introsort.

struct Game_board;

struct Node
{
   Game_board game_board;
   Node* parent;
   std::vector<Node*> children;
   enum { X_Win, Y_Win, Draw, Playing } outcome;
};

// returns the same hash value for all "identical" boards.
// ie boards that can be rotated or mirrored to look the
// same will have the same hash value
int hash( const Game_board& game_board );

// uses hash() function to generate hashes from Node*
struct Hash_functor;

// nodes yet to be explored.
std::hash_set<Node*,Hash_functor> open;

//nodes already explored.
std::hash_set<Node*,Hash_functor> closed;

while( ! open.empty() )
{
   Node* node_to_expore = get_a_node( open );
   assert( node_to_expore not in close or open sets )
   if( node_to_expore is win lose or draw )
   {
      Mark node as win lose or draw
      add node to closed set
   }
   loop through all children of node_to_expore
   {
      if( child in close )
      {
         add node from closed set to children list of node_to_expore
      }
      else if( child in open )
      {
         add node from open set to children list of node_to_explore
      }
      else
      {
         add child to open set
         add child to children list of node_to_expore
      }
   }
}
caspin
A: 

You are asking how to optimize the code however you should also be asking how to optimize the algorithm.

There are two things that I immediately see.

  1. As "Michael Dorgan" stated generate the tree of moves once.

  2. How many broads are you generating in your tree? 362880? Your code appears to be generating redundant entries. For example, with an empty board there are actually three moves not nine moves. All other combinations are the board rotated (which are equal). You can reduce the number of boards that needs to be generated and speed up the generation of the tree.

Here are the three first moves(rotate the last two board to generate the other boards)

 | |
 |X|
 | |

 |X|
 | |
 | |

X| |
 | |
 | |
mholzmann