ansaurus

Question

Advice for converting a large monolithic singlethreaded application to a multithreaded architecture?

Answer 1

+4 A:

You might just start out breaking the the UI and the work task into separate threads.

In your paint method instead of calling getData() directly, it puts the request in a thread-safe queue. getData() is run in another thread that reads its data from the queue. When the getData thread is done, it signals the main thread to redraw the visualisation area with its result data using thread syncronization to pass the data.

While all this is going on you of course have a progress bar saying reticulating splines so the user knows something is going on.

This would keep your UI snappy without the significant pain of multithreading your work routines (which can be akin to a total rewrite)

Byron Whitlock 2010-02-05 00:28:51

I was about to recommend more or less the same thing except rather man "separate threads" it would be easier to implement "separate processes" and avoid the pain of checking every single C++ function for thread safety. Break out the gui, calculation engine and data model into three separate components, and have them communicate by some sort of asynchronous message passing. The goal should be to run more than one calculation engine process in parallel.

James Anderson 2010-02-05 01:16:46

When your paint method queues a request, what does it paint while waiting for the request to be serviced?

John Knoeller 2010-02-05 01:24:08

To put it another way, Paint messages should never trigger database queries _at all_. User input should be what triggers the query, not paint.

John Knoeller 2010-02-05 01:57:16

Eventually you will end up splitting the application into modules that can run completely apart from each other. Maybe even in separate tasks (why limit yourself to threading).

Warren P 2010-02-08 20:58:36

Answer 2

+4 A:

The main thing you have to do is to disconnect your UI from your data set. I'd suggest that the way to do that is to put a layer in between.

You will need to design a data structure of data cooked-for-display. This will most likely contain copies of some of your back-end data, but "cooked" to be easy to draw from. The key idea here is that this is quick and easy to paint from. You may even have this data structure contain calculated screen positions of bits of data so that it's quick to draw from.

Whenever you get a WM_PAINT message you should get the most recent complete version of this structure and draw from it. If you do this properly, you should be able to handle multiple WM_PAINT messages per second because the paint code never refers to your back end data at all. It's just spinning through the cooked structure. The idea here is that its better to paint stale data quickly than to hang your UI.

Meanwhile...

You should have 2 complete copies of this cooked-for-display structure. One is what the WM_PAINT message looks at. (call it cfd_A) The other is what you hand to your CookDataForDisplay() function. (call it cfd_B). Your CookDataForDisplay() function runs in a separate thread, and works on building/updating cfd_B in the background. This function can take as long as it wants because it isn't interacting with the display in any way. Once the call returns cfd_B will be the most up-to-date version of the structure.

Now swap cfd_A and cfd_B and InvalidateRect on your application window.

A simplistic way to do this is to have your cooked-for-display structure be a bitmap, and that might be a good way to go to get the ball rolling, but I'm sure with a bit of thought you can do a much better job with a more sophisticated structure.

So, referring back to your example.

In the paint method, it will call a GetData method, often hundreds of times for hundreds of bits of data in one paint operation

This is now 2 threads, the paint method refers to cfd_A and runs on the UI thread. Meanwhile cfd_B is being built by a background thread using GetData calls.

The quick-and-dirty way to do this is

Take your current WM_PAINT code, stick it into a function called PaintIntoBitmap().
Create a bitmap and a Memory DC, this is cfd_B.
Create a thread and pass it cfd_B and have it call PaintIntoBitmap()
When this thread completes, swap cfd_B and cfd_A

Now your new WM_PAINT method just takes the pre-rendered bitmap in cfd_A and draws it to the screen. Your UI is now disconnnected from your backend GetData() function.

Now the real work begins, because the quick-and-dirty way doesn't handle window resizing very well. You can go from there to refine what your cfd_A and cfd_B structures are a little at a time until you reach a point where you are satisfied with the result.

John Knoeller 2010-02-05 00:38:25

Answer 3

+12 A:

You have a big challenge ahead of you. I had a similar challenge ahead of me -- 15 year old monolithic single threaded code base, not taking advantage of multicore, etc. We expended a great deal of effort in trying to find a design and solution that was workable and would work.

Bad news first. It will be somewhere between impractical and impossible to make your single-threaded app multithreaded. A single threaded app relies on it's singlethreaded-ness is ways both subtle and gross. One example is if the computation portion requires input from the GUI portion. The GUI must run in the main thread. If you try to get this data directly from the computation engine, you will likely run in to deadlock and race conditions that will require major redesigns to fix. Many of these reliances will not crop up during the design phase, or even during the development phase, but only after a release build is put in a harsh environment.

More bad news. Programming multithreaded applications is exceptionally hard. It might seem fairly straightforward to just lock stuff and do what you have to do, but it is not. First of all if you lock everything in sight you end up serializing your application, negating every benefit of mutithreading in the first place while still adding in all the complexity. Even if you get beyond this, writing a defect-free MP application is hard enough, but writing a highly-performant MP application is that much more difficult. You could learn on the job in a kind of baptismal by fire. But if you are doing this with production code, especially legacy production code, you put your buisness at risk.

Now the good news. You do have options that don't involve refactoring your whole app and will give you most of what you seek. One option in particular is easy to implement (in relative terms), and much less prone to defects than making your app fully MP.

You could instantiate multiple copies of your application. Make one of them visible, and all the others invisible. Use the visible application as the presentation layer, but don't do the computational work there. Instead, send messages (perhaps via sockets) to the invisible copies of your application which do the work and send the results back to the presentation layer.

This might seem like a hack. And maybe it is. But it will get you what you need without putting the stability and performance of your system at such great risk. Plus there are hidden benefits. One is that the invisible engine copies of your app will have access to their own virtual memory space, making it easier to leverage all the resources of the system. It also scales nicely. If you are running on a 2-core box, you could spin off 2 copies of your engine. 32 cores? 32 copies. You get the idea.

John Dibling 2010-02-05 00:41:37

+1 I like the hack. Not pretty but it works.

Byron Whitlock 2010-02-05 00:57:48

+1 the message exchange could also be done using some message queue, for example MSMQ

mjustin 2010-02-05 06:32:37

If your app uses any scarce system resource, files, a database, sound card, a EKG machine, etc, multiple instances will fight over it. I once worked on an app that could have been multi instance _except_ that the undo history would end up being shared across instances. Nearly all apps not designed to be multi-instance have issues like this.

John Knoeller 2010-02-05 10:36:28

+1. Multi-threading can not be bolted on later. However, while you are at it, you might want to identify more goals than just the multi-threading, and then begin a wholesale "branch" (in version control) and "redesign" process. Multi-core is important to you why?

Warren P 2010-02-08 21:00:09

Answer 4

+2 A:

It sounds like you have several different issues that parallelism can address, but in different ways.

Performance increases through utilizing multicore CPU Architecutres

You're not taking advantage of the multi-core CPU architetures that are becoming so common. Parallelization allow you to divide work amongst multiple cores. You can write that code through standard C++ divide and conquer techniques using a "functional" style of programming where you pass work to separate threads at the divide stage. Google's MapReduce pattern is an example of that technique. Intel has the new CILK library to give you C++ compiler support for such techniques.

Greater GUI responsiveness through asynchronous document-view

By separating the GUI operations from the document operations and placing them on different threads, you can increase the apparent responsiveness of your application. The standard Model-View-Controller or Model-View-Presenter design patterns are a good place to start. You need to parallelize them by having the model inform the view of updates rather than have the view provide the thread on which the document computes itself. The View would call a method on the model asking it to compute a particular view of the data, and the model would inform the presenter/controller as information is changed or new data becomes available, which would get passed to the view to update itself.

Opportunistic caching and pre-calculation It sounds like your application has a fixed base of data, but many possible compute-intensive views on the data. If you did a statistical analysis on which views were most commonly requested in what situations, you could create background worker threads to pre-calculate the likely-requested values. It may be useful to put these operations on low-priority threads so that they don't interfere with the main application processing.

Obviously, you'll need to use mutexes (or critical sections), events, and probably semaphores to implement this. You may find some of the new synchronization objects in Vista useful, like the slim reader-writer lock, condition variables, or the new thread pool API. See Joe Duffy's book on concurrency for how to use these basic techniques.

David Gladfelter 2010-02-05 00:43:00

Taking advantage of multicore processors is one key reason for doing this. Unfortunately CILK looks like it's not compatible with our compiler (I added more info on this in an edit just now.) Re MVP: thanks. That's a good summary of a design I think's worth investigating: I hadn't thought of having the model notify the view quite like that. Our current design is not MVP-ish at all, something I'd like to change anyway. Thanks for the book recommendation too!

David M 2010-02-05 01:13:19

Answer 5

+10 A:

So, there's a hint in your description of the algorithm as to how to proceed:

often quite a complex data flow - think of this as data flowing through a complex graph, each node of which performs operations

I'd look into making that data-flow graph be literally the structure that does the work. The links in the graph can be thread-safe queues, the algorithms at each node can stay pretty much unchanged, except wrapped in a thread that picks up work items from a queue and deposits results on one. You could go a step further and use sockets and processes rather than queues and threads; this will let you spread across multiple machines if there is a performance benefit in doing this.

Then your paint and other GUI methods need split in two: one half to queue the work, and the other half to draw or use the results as they come out of the pipeline.

This may not be practical if the app presumes that data is global. But if it is well contained in classes, as your description suggests it may be, then this could be the simplest way to get it parallelised.

Andrew McGregor 2010-02-05 00:54:29

This is an interesting approach I hadn't thought of. The program is not always well contained - it is an old codebase - but this node-based area is actually one of the cleanest designed areas of the program and so this is well worth investigating. Thanks!

David M 2010-02-05 01:07:53

I heartily endorse this approach. Not only does it provide a good structure, but carefully done, it minimizes data contention and the need for locking (thus maximizing the benefit of modern multicore CPUs and hyperthreading). The key trick is only allow the worker threads to access data in the request object, and to place their results there too. Since only one thread owns the request at a given time, there is no data contention (apart from locks on the the queues to make insertion/removal atomic).

Stephen C. Steel 2010-06-04 17:37:08

Marked as the answer since this approach is the closest to what we've gone with, after a lot of research / prototyping / other work. (Dthorpe's answer below is also insightful / useful, as are several other answers.) Thanks Andrew!

David M 2010-08-10 07:35:14

Answer 6

+1 A:

You can also look at this article from Herb Sutter You have a mass of existing code and want to add concurrency. Where do you start?

Jagannath 2010-02-05 01:16:24

It's a good article, however the technique presented in it won't help with the biggest problem in the application, the synchronous calls that may take anything from milliseconds to hours. Cutting the time a `WM_PAINT` handler takes from 4 hours to 1 hour on a 4-core machine simply isn't good enough. The calculations need to be performed asynchronously, and all code paths for painting and user interaction need to complete in a few 100 milliseconds at most.

mghie 2010-02-09 12:09:03

Answer 7

+1 A:

Here's what I would do...

I would start by profiling your and seeing:

1) what is slow and what the hot paths are 2) which calls are reentrant or deeply nested

you can use 1) to determine where the opportunity is for speedups and where to start looking for parallelization.

you can use 2) to find out where the shared state is likely to be and get a deeper sense of how much things are tangled up.

I would use a good system profiler and a good sampling profiler (like the windows perforamnce toolkit or the concurrency views of the profiler in Visual Studio 2010 Beta2 - these are both 'free' right now).

Then I would figure out what the goal is and how to separate things gradually to a cleaner design that is more responsive (moving work off the UI thread) and more performant (parallelizing computationally intensive portions). I would focus on the highest priority and most noticable items first.

If you don't have a good refactoring tool like VisualAssist, invest in one - it's worth it. If you're not familiar with Michael Feathers or Kent Beck's refactoring books, consider borrowing them. I would ensure my refactorings are well covered by unit tests.

You can't move to VS (I would recommend the products I work on the Asynchronous Agents Library & Parallel Pattern Library, you can also use TBB or OpenMP).

In boost, I would look carefully at boost::thread, the asio library and the signals library.

I would ask for help / guidance / a listening ear when I got stuck.

-Rick

Rick 2010-02-05 03:20:31

+1. If you could just profile your app, find a few optimizations and gain X% performance without changing the whole architecture, I would sure do that first!

Warren P 2010-02-08 21:12:06

It's not always that easy, particularly if the code is already tuned a bunch, but it is always worth it to profile first and understand what is going on.

Rick 2010-02-09 05:30:59

The code is indeed profiled and tuned, and our biggest bottleneck now appears to be the use of a single core. A profiler is very handy for finding out where to make use of multiple cores among all that, though!

David M 2010-06-08 02:14:52

just a piece of pointed feedback, after re-reading this post. I would first take the work off of the UI thread and make it asynchronous, likely registering a custom windows event and using PostMessage to let the UI thread know that it's done. I'll reiterate my statement to look closely at boost::signals and asio, this can help you convert your app into a data flow pipeline.good luck!

Rick 2010-06-08 06:59:57

Answer 8

+2 A:

There is something that no-one has talked about yet, but which is quite interesting.

It's called futures. A future is the promise of a result... let's see with an example.

future<int> leftVal = computeLeftValue(treeNode); // [1]

int rightVal = computeRightValue(treeNode); // [2]

result = leftVal + rightVal; // [3]

It's pretty simple:

You spin off a thread that starts computing leftVal, taking it from a pool for example to avoid the initialization problem.
While leftVal is being computed, you compute rightVal.
You add the two, this may block if leftVal is not computed yet and wait for the computation to end.

The great benefit here is that it's straightforward: each time you have one computation followed by another that is independent and you then join the result, you can use this pattern.

See Herb Sutter's article on futures, they will be available in the upcoming C++0x but there are already libraries available today even if the syntax is perhaps not as pretty as I would make you believe ;)

Matthieu M. 2010-02-05 08:03:33

Answer 9

+2 A:

If it was my development dollars I was spending, I would start with the big picture:

What do I hope to accomplish, and how much will I spend to accomplish this, and how will I be further ahead? (If the answer to this is, my app will run 10% better on quadcore PCs, and I could have achieved the same result by spending $1000 more per customer PC , and spending $100,000 less this year on R&D, then, I would skip the whole effort).
Why am I doing multi-threaded instead of massively parallel distributed? Do I really think threads are better than processes? Multi-core systems also run distributed apps pretty well. And there are some advantages to message-passing process based systems that go beyond the benefits (and the costs!) of threading. Should I consider a process-based approach? SHould I consider a background running entirely as a service, and a foreground GUI? Since my product is node-locked and licensed, I think services would suit me (vendor) quite well. Also, separating stuff into two processes (background service and foreground) just might force the kind of rewrite and rearchitecting to occur that I might not be forced to do, if I was to just add threading into my mix.
This is just to get you thinking: What if you were to rewrite it as a service (background app) and a GUI, because that would actually be easier than adding threading, without also adding crashes, deadlocks, and race conditions?
Consider the idea that for your needs, perhaps threading is evil. Develop your religion, and stick with that. Unless you have a real good reason to go the other way. For many years, I religiously avoided threading. Because one thread per process is good enough for me.

I don't see any really solid reasons in your list why you need threading, except ones that could be more inexpensively solved by more expensive target computer hardware. If your app is "too slow" adding in threads might not even speed it up.

I use threads for background serial communications, but I would not consider threading merely for computationally heavy applications, unless my algorithms were so inherently parallel as to make the benefits clear, and the drawbacks minimal.

I wonder if the "design" problems that this C++Builder app has are like my Delphi "RAD Spaghetti" application disease. I have found that a wholesale refactor/rewrite (over a year per major app that I have done this to), was a minimum amount of time for me to get a handle on application "accidental complexity". And that was without throwing a "threads where possible" idea. I tend to write my apps with threads for serial communication and network socket handling, only. And maybe the odd "worker-thread-queue".

If there is a place in your app you can add ONE thread, to test the waters, I would look for the main "work queue" and I would create an experimental version control branch, and I would learn about how my code works by breaking it in the experimental branch. Add that thread. And see where you spend your first day of debugging. Then I might just abandon that branch and go back to my trunk until the pain in my temporal lobe subsides.

Warren

Warren P 2010-02-08 21:07:45

Answer 10

A:

It is hard to give you proper guidelines. But...

The easiest way out according to me is to convert your application to ActiveX EXE as COM has support for Threading, etc. built right into it your program will automatically become Multi Threading application. Of course you will have to make quite a few changes to your code. But this is the shortest and safest way to go.

I am not sure but probably RichClient Toolset lib may do the trick for you. On the site the author has written:

It also offers registration free Loading/Instancing-capabilities for ActiveX-Dlls and new, easy to use Threading-approach, which works with Named-Pipes under the hood and works therefore also cross-process.

Please check it out. Who knows it may be the right solution for your requirements.

As for Project management I think you can continue using what is provided in your choice IDE by integrating it with SVN through plugins.

I forgot to mention that we have completed an application for Share market that automatically trades (buys and sells based on lows and highs) into those scripts that are in user portfolio based on an algorithm that we have developed.

While developing this software we were facing the same kind of problem as you have illustrated here. To solve it we converted out application in ActiveX EXE and we converted all those parts that need to execute parallely into ActiveX DLLs. We have not used any third party libs for this!

HTH

Yogi Yang 007 2010-02-09 11:18:52

-1. Please explain how moving code to DLLs will make the execution asynchronous and concurrent.

mghie 2010-02-09 11:59:08

What we did was encapsulated all the code that will get executed for more than one event simultaneously. In our case it was when a script's price is updated, which is at times simultaneously for more than one script. After separating this code created an ActiveX DLL and set the most important property as following: Instancing = MultiUserNow for every Price update event we will create a new instance of the ActiveX object and pass it necessary values. That is it!This will automatically spawn new threads for every instance that we create. In our case we never need to be notified by the thread.

Yogi Yang 007 2010-02-12 05:05:19

Answer 11

A:

I hope this will help you in understanding and converting your monolithic single threaded app to multi thread easily. Sorry it is for another programming language but never the less the principles explained are the same all over.

http://www.freevbcode.com/ShowCode.Asp?ID=1287

Hope this helps.

Yogi Yang 007 2010-02-18 11:14:55

Answer 12

A:

Well, I think you're expecting a lot based on your comments here. You're not going to go from minutes to milliseconds by multithreading. The most you can hope for is the current amount of time divided by the number of cores. That being said, you're in a bit of luck with C++. I've written high performance multiprocessor scientific apps, and what you want to look for is the most embarrassingly parallel loop you can find. In my scientific code, the heaviest piece is calculating somewhere between 100 and 1000 data points. However, all of the data points can be calculated independently of the others. You can then split the loop using openmp. This is the easiest and most efficient way to go. If you're compiler doesn't support openmp, then you will have a very hard time porting existing code. With openmp (if you're lucky), you may only have to add a couple of #pragmas to get 4-8x the performance. Here's an example StochFit

Steve 2010-02-18 11:27:01

You can't change the calculation time from minutes to milliseconds by multithreading, but you certainly can make an improvement of this size in the responsiveness of the GUI by moving slow calculations to another thread. Because Windows serializes drawing operations internally, all GUI handling should be done from the main thread.

Stephen C. Steel 2010-06-04 17:46:54

Answer 13

A:

The first thing you must do is to separate your GUI from your data, the second is to create a multithreaded class.

STEP 1 - Responsive GUI

We can assume that the image you are producing is contained in the canvas of a TImage. You can put a simple TTimer in you form and you can write code like this:

if (CurrenData.LastUpdate>CurrentUpdate)
    {
    Image1->Canvas->Draw(0,0,CurrenData.Bitmap);
    CurrentUpdate=Now();
    }

OK! I know! Is a little bit dirty, but it's fast and is simple.The point is that:

You need an Object that is created in the main thread
The object is copied in the Form you need, only when is needed and in a safe way (ok, a better protection for the Bitmap may be is needed, but for semplicity...)
The object CurrentData is your actual project, single threaded, that produces an image

Now you have a fast and responsive GUI. If your algorithm as slow, the refresh is slow, but your user will never think that your program is freezed.

STEP 2 - Multithread

I suggest you to implement a class like the following:

SimpleThread.h

typedef void (__closure *TThreadFunction)(void* Data);

class TSimpleThread : public TThread
{
public:
    TSimpleThread( TThreadFunction _Action,void* _Data = NULL, bool RunNow = true );
    void AbortThread();

    __property Terminated; 

protected:
    TThreadFunction ThreadFunction;
    void*           Data;

private:
    virtual void __fastcall Execute() { ThreadFunction(Data); };
};

SimpleThread.c

TSimpleThread::TSimpleThread( TThreadFunction _Action,void* _Data, bool RunNow)
             : TThread(true), // initialize suspended
               ThreadFunction(_Action), Data(_Data)
{
FreeOnTerminate = false;
if (RunNow) Resume();
}

void TSimpleThread::AbortThread()
{
Suspend(); // Can't kill a running thread
Free();    // Kills thread
}

Let's explain. Now, in your simple threaded class you can create an object like this:

TSimpleThread *ST;
ST=new TSimpleThread( RefreshFunction,NULL,true);
ST->Resume();

Let's explain better: now, in your own monolithic class, you have created a thread. More: you bring a function (ie: RefreshFunction) in a separate thread. The scope of your funcion is the same, the class is the same, the execution is separate.

Redax 2010-06-04 17:04:28

Answer 14

+3 A:

Don't attempt to multithread everything in the old app. Multithreading for the sake of saying it's multithreaded is a waste of time and money. You're building an app that does something, not a monument to yourself.
Profile and study your execution flows to figure out where the app spends most of its time. A profiler is a great tool for this, but so is just stepping through the code in the debugger. You find the most interesting things in random walks.
Decouple the UI from long-running computations. Use cross-thread communications techniques to send updates to the UI from the computation thread.
As a side-effect of #3: think carefully about reentrancy: now that the compute is running in the background and the user can smurf around in the UI, what things in the UI should be disabled to prevent conflicts with the background operation? Allowing the user to delete a dataset while a computation is running on that data is probably a bad idea. (Mitigation: computation makes a local snapshot of the data) Does it make sense for the user to spool up multiple compute operations concurrently? If handled well, this could be a new feature and help rationalize the app rework effort. If ignored, it will be a disaster.
Identify specific operations that are candidates to be shoved into a background thread. The ideal candidate is usually a single function or class that does a lot of work (requires a "lot of time" to complete - more than a few seconds) with well defined inputs and outputs, that makes use of no global resources, and does not touch the UI directly. Evaluate and prioritize candidates based on how much work would be required to retrofit to this ideal.
In terms of project management, take things one step at a time. If you have multiple operations that are strong candidates to be moved to a background thread, and they have no interaction with each other, these might be implemented in parallel by multiple developers. However, it would be a good exercise to have everybody participate in one conversion first so that everyone understands what to look for and to establish your patterns for UI interaction, etc. Hold an extended whiteboard meeting to discuss the design and process of extracting the one function into a background thread. Go implement that (together or dole out pieces to individuals), then reconvene to put it all together and discuss discoveries and pain points.
Multithreading is a headache and requires more careful thought than straight up coding, but splitting the app into multiple processes creates far more headaches, IMO. Threading support and available primitives are good in Windows, perhaps better than some other platforms. Use them.
In general, don't do any more than what is needed. It's easy to severely over implement and over complicate an issue by throwing more patterns and standard libraries at it.
If nobody on your team has done multithreading work before, budget time to make an expert or funds to hire one as a consultant.

dthorpe 2010-06-04 17:41:34

That looks very good advice, Danny - thanks! By the way, it's rather cool getting a reply from you - I have a copy of Delphi Component Design on my desk as I type.

David M 2010-06-08 02:18:19

Hey! So do I... Somewhere under all this stuff... ;>

dthorpe 2010-06-08 05:31:23

ansaurus

tags:

views:

answers:

Advice for converting a large monolithic singlethreaded application to a multithreaded architecture?

related questions