I have a very old, very very large, fully working, C program which plays a board game. I want to convert it (or should I say parts of it) to work in multiple threads, so that I can take advantage of multi-core processors. In the old program there is a global UBYTE array called board[]. There are a great many (highly optimized, highly speed critical) functions which manipulate the contents of board[]. I now want to make the process work as follows:
step 1. Perform a large number of operations in a single thread performing many manipulations of the single board[]. These are the things that are too complex to perform on multiple cores.
Step 2. farm out multiple copies of "board[]" to a collection of threads and have each thread spend some time doing their own separate manipulations of their own private "board[]"'s.
Step 3. the threads finish their work and return some answers to the main thread.
For arguments sake lets say there will be 32 sub threads.
Now one way to do this would be to make one global board[] and 32 sub-boards with a different name like sub_board[32][] and then write a new bunch of board manipulation functions that work on the new 2 dimensional sub_board[][], but this would ruin my optimization because there would need to be an additional multiply and add for every access to the game board. Also the new versions of the old board manipulation functions will be slightly messier.
Now I have not been a C++ programmer before (but I'm learning as fast as I can) and someone has suggested the following trick involving C++ (I'm not sure that I've got all the details correct): I leave the existing board[] as is. I leave all the existing board manipulation functions as is. I make a new class (lets call it thread_type) which contains a board[] and a new set of board manipulation functions. Something like this:
class thread_type
{
UBYTE board[]; // boards for slave threads to work with
void board_manipulation_A(void);
void board_manipulation_B(void);
}
The board manipulation functions are identical to the old ones (so I can cut and paste) other than being declared with "thread_type::" at the the start. Then in main() I have:
class thread_type slave[32];
Now I can manipulate a single global board[] with all my old code in the base thread. Then I can copy the main board[] to slave[n].board[] then have
For (i = 0; i < 32;i++)
{
// there will have to be some extra thread/mutex
// related code around here but I'm not showing it for simplicity
slave[n].do_your_stuff();
}
Now inside each of the 32 threads, each one will be working on their own different "board[]" with code that is pretty much identical to the old original (fully debugged and optimized) code. I could even avoid the cut and past of the old code altogether by doing some #define tricks, i.e. having the function declarations written like this
void THREAD_OR_BASE board_manipulation_A(void);
and then run through this once with
#define THREAD_OR_BASE // zilch
and once with
#define THREAD_OR_BASE thread_type::
This way I can be quite certain that any time I make a modification to board_manipulation_A() it will appear both in the base thread version and the sub-thread one.
My questions are: A) Will it all work? B) Did I miss some vital step? C) could I have achieved the same thing with some simpler method?
Edit: instead of 32 threads, I should have said "as many threads as there are cores"