views:

159

answers:

3

I have a very old, very very large, fully working, C program which plays a board game. I want to convert it (or should I say parts of it) to work in multiple threads, so that I can take advantage of multi-core processors. In the old program there is a global UBYTE array called board[]. There are a great many (highly optimized, highly speed critical) functions which manipulate the contents of board[]. I now want to make the process work as follows:

step 1. Perform a large number of operations in a single thread performing many manipulations of the single board[]. These are the things that are too complex to perform on multiple cores.

Step 2. farm out multiple copies of "board[]" to a collection of threads and have each thread spend some time doing their own separate manipulations of their own private "board[]"'s.

Step 3. the threads finish their work and return some answers to the main thread.

For arguments sake lets say there will be 32 sub threads.

Now one way to do this would be to make one global board[] and 32 sub-boards with a different name like sub_board[32][] and then write a new bunch of board manipulation functions that work on the new 2 dimensional sub_board[][], but this would ruin my optimization because there would need to be an additional multiply and add for every access to the game board. Also the new versions of the old board manipulation functions will be slightly messier.

Now I have not been a C++ programmer before (but I'm learning as fast as I can) and someone has suggested the following trick involving C++ (I'm not sure that I've got all the details correct): I leave the existing board[] as is. I leave all the existing board manipulation functions as is. I make a new class (lets call it thread_type) which contains a board[] and a new set of board manipulation functions. Something like this:

class thread_type 
{
    UBYTE board[]; // boards for slave threads to work with
    void board_manipulation_A(void);
    void board_manipulation_B(void);
}

The board manipulation functions are identical to the old ones (so I can cut and paste) other than being declared with "thread_type::" at the the start. Then in main() I have:

class thread_type slave[32];

Now I can manipulate a single global board[] with all my old code in the base thread. Then I can copy the main board[] to slave[n].board[] then have

For (i = 0; i < 32;i++)
{
    // there will have to be some extra thread/mutex 
    // related code around here but I'm not showing it for simplicity

    slave[n].do_your_stuff(); 
}

Now inside each of the 32 threads, each one will be working on their own different "board[]" with code that is pretty much identical to the old original (fully debugged and optimized) code. I could even avoid the cut and past of the old code altogether by doing some #define tricks, i.e. having the function declarations written like this

void THREAD_OR_BASE board_manipulation_A(void);

and then run through this once with

#define THREAD_OR_BASE // zilch

and once with

#define THREAD_OR_BASE thread_type::

This way I can be quite certain that any time I make a modification to board_manipulation_A() it will appear both in the base thread version and the sub-thread one.

My questions are: A) Will it all work? B) Did I miss some vital step? C) could I have achieved the same thing with some simpler method?

Edit: instead of 32 threads, I should have said "as many threads as there are cores"

+2  A: 

If you don't need to merge thread result boards back in the same one, looks to me like a good strategy, making each thread have their own copy of the board and working on it, I don't see why it shouldn't work.

However, looks to me like the threads will perform a lot of cpu-bound operations, if thats true, you shouldn't have that many threads, it's better to have the same threads as cores your cpu has or few more, if more, they'll compete for the cpu resources, and your performance will be degraded.

Arkaitz Jimenez
Thank you. Your comment gives me confidence to go ahead and implement the scheme.
Mick
+1  A: 

Foreign code has often a bunch of global states and settings (maybe also in your case beside the board[]). These variables also have to be packed into your class, if used by your threads through the methods (excepting when they are read-only). Check also the static variables inside functions (local statics are actually also global as they exist once for the application) they are also not thread safe.

I guess you already did these checks. I hope this answer helps others wanting to make similar modifications to existing applications...

jdehaan
Thanks for the warnings. Yes there are other globals - I had simplified my explanation of the problem for the sake of clarity. But I hadn't given any thought to static variables declared in my functions - so thanks for the heads up.
Mick
+1  A: 

Work through a tutorial which explains inheritance in C++. It's not done with macros.

It's a design smell that you have to have more than one definition of a function, and even more that you are having to have multiple definitions based on who is calling the function ( the main thread's procedure or a slave thread's procedure ).

Instead of creating classes to describe the behaviour of a thread, make a Board class which has the two manipulations and the data for the board, along with any globals. It won't matter whether it's called in one thread or the other. If your board class has a UBYTE board[] member you shouldn't have to change the existing code that much, and you won't be abusing the language.

If you want to have a concurrent work parcel which does something different, call different functions in the procedure for that worker. Packaging up a board with the behaviour for a worker is a good idea, but have the worker call functions on a board object rather than using the low-level representation directly.

On the assumption that you have fewer than 32 processors, it's entirely possible that you may well not get much of an improvement - if the operations are fast you'll spend as much time creating OS threads and switching between them as you do doing anything useful. If you're feeling brave, look at creating only as many threads as you have CPU cores, and a queue of your 32 work parcels, so that each thread pulls the next parcel off the queue, runs it, an stores the result. That way you're not creating threads for small work items. I've tended to roll my own, but a cursory look at this threadpool seems easy to use for abstracting between 'these 32 tasks I want to run in parallel' and 'the OS features which allow me to run tasks on the different cores on my machine'.


One of the primary rules of OO design is to encapsulate behaviour with the data it operates on.

So a class to represent the data and operations pertaining to a board could look something like this:

class Board
{   
    private:
        UBYTE board[ BOARD_SIZE ]; 

    public 
        // ... suitable constructors    

        void manipulation_A ();
        void manipulation_B ();
};

You change to your existing code ( assuming that board is the only global data ) to make the functions member functions of the Board class:

void Board::manipulation_A () 
{
    ...
}

void Board::manipulation_B ()
{
    ...
}

Your work objects each have their own board to work on:

class WorkItemAAB
{
    private: 
        Board board;

    public:
        void run () {
            board.manipulation_A();
            board.manipulation_A();
            board.manipulation_B();
        }
};

or whatever sequence of manipulations you want.

You can call the functions of the board object from anywhere, and it only effects the data belonging to that object. There's no need to use a macro to switch between a local and a global board, or compile the same code twice.

Your main thread can do its manipulations, then create work items which each have their own copy of the board data, and push them off to separate threads / queue them with a higher level concurrent scheduler.

Pete Kirkham
was the word "smell" in the first sentence a typo?
Mick
There is no word "smell" in "Work through a tutorial which explains inheritance in C++." The phrase "design smell" is or "code smell" is very common. Using macros to turn C code into C++ or not, depending which thread you are calling it from smells like someone who doesn't understand the basics of object oriented programming in C++.
Pete Kirkham
The code is 6MB long, some of it dates back 25 years. I'm reluctant to have to change too much this is not an unlimited time project. Under my proposed scheme the main thread and its functions can change its version of "board[]" with a line like this: board[i] = x;just as it was in the original program.A slave thread can change its version of the "board[]" with a line like this: board[i] = x;As you can imagine this makes for extremely easy re-use of the vast amounts of existing (debugged and optimized) code. What would these two lines of code look like under your proposed system?
Mick
Your system seems completely sensible if my program was either small or I was starting afresh. But for my given circumstances it appears rather impractical.
Mick
The changes you need to make to make the functions work with a "Board" object are the same as the changes you would need to make to make them work with the "thread_type" class; the difference is that your proposal was to use macros to also allow the code to work as global functions, which will be more work, prone to bugs, and is neither a good design or remotely necessary.
Pete Kirkham