tags:

views:

325

answers:

5

I am building an app that will do some object tracking from a video camera feed and use information from that to run a particle system in opengl. The code to process the video feed is somewhat slow 200 - 300 milliseconds per frame right now. The system that this will be running on has a dual core processor. To maximize performance I am wanting to offload the camera processing stuff to one processor and just communicate relevant data back to the main app as it is available, while leaving the main app kicking on the other processor.

What do I need to do to offload the camera work to the other processor and how do I handle communication with the main app?

Edit: I am running Windows 7 64bit

+1  A: 

You need some kind of framework for handling multicores. OpenMP seems a fairly simple choice.

Kornel Kisielewicz
You could also just use pthreads or whatever the OS already provides.
Pestilence
@Pestilence - yes, although I stick to proposing cross-platform solutions :)
Kornel Kisielewicz
lol. pthreads on cygwin then! :)
Pestilence
@Pestilence - cygwin is windows-only :P
Kornel Kisielewicz
+2  A: 

I would recommend against OpenMP, OpenMP is more for numerical codes rather than consumer/producer model that you seem to have.

I think you can do something simple using boost threads to spawn worker thread, common segment of memory (for communication of acquired data), and some notification mechanism to tell on your data is available (look into boost thread interrupts).

I do not know what kind of processing you do, but you may want to take a look at the Intel thread building blocks and Intel integrated primitives, they have several functions for video processing which may be faster (assuming they have your functionality)

aaa
Considering that a split between consumer and producer would gain almost nothing in terms of efficiency, I think he'll need parallel processing of the camera data anyway. And there's no easy way to guarantee that a second boost::thread will spawn on the other core anyway...
Kornel Kisielewicz
MPI would be more practicable in this case I think.
tur1ng
without knowing the details it's hard to say, but in general I do agree with you. In Linux land there is cpuset utility that controls thread placement, have not used it myself though. in my opinion mpi would be an overkill, I do not think Mr. Bell intends to run on multiple nodes.
aaa
+5  A: 

Basically, you need to multithread your application. Each thread of execution can only saturate one core. Separate threads tend to be run on separate cores. If you are insistent that each thread ALWAYS execute on a specific core, then each operating system has its own way of specifying this (affinity masks & such)... but I wouldn't recommend it.

OpenMP is great, but it's a tad fat in the ass, especially when joining back up from a parallelization. YMMV. It's easy to use, but not at all the best performing option. It also requires compiler support.

If you're on Mac OS X 10.6 (Snow Leopard), you can use Grand Central Dispatch. It's interesting to read about, even if you don't, as its design implements some best practices. It also isn't optimal, but it's better than OpenMP, even though it also requires compiler support.

If you can wrap your head around breaking up your application into "tasks" or "jobs," you can shove these jobs down as many pipes as you have cores. Think of batching your processing as atomic units of work. If you can segment it properly, you can run your camera processing on both cores, and your main thread at the same time.

If communication is minimized for each unit of work, then your need for mutexes and other locking primitives will be minimized. Course grained threading is much easier than fine grained. And, you can always use a library or framework to ease the burden. Consider Boost's Thread library if you take the manual approach. It provides portable wrappers and a nice abstraction.

Pestilence
+1 even if only for "tad fat in the ass".
GMan
+1  A: 

It depends on how many cores you have. If you have only 2 cores (cpu, processors, hyperthreads, you know what i mean), then OpenMP cannot give such a tremendous increase in performance, but will help. The maximum gain you can have is divide your time by the number of processors so it will still take 100 - 150 ms per frame.

The equation is
parallel time = (([total time to perform a task] - [code that cannot be parallelized]) / [number of cpus]) + [code that cannot be parallelized]

Basically, OpenMP rocks at parallel loops processing. Its rather easy to use

#pragma omp parallel for
for (i = 0; i < N; i++)
    a[i] = 2 * i;

and bang, your for is parallelized. It does not work for every case, not every algorithm can be parallelized this way but many can be rewritten (hacked) to be compatible. The key principle is Single Instruction, Multiple Data (SIMD), applying the same convolution code to multiple pixels for example.

But simply applying this cookbook receipe goes against the rules of optimization.
1-Benchmark your code
2-Find the REAL bottlenecks with "scientific" evidence (numbers) instead of simply guessing where you think there is a bottleneck
3-If it is really processing loops, then OpenMP is for you

Maybe simple optimizations on your existing code can give better results, who knows?

Another road would be to run opengl in a thread and data processing on another thread. This will help a lot if opengl or your particle rendering system takes a lot of power, but remember that threading can lead to other kind of synchronization bottlenecks.

Eric
A: 

Like what Pestilence said, you just need your app to be multithreaded. Lots of frameworks like OpenMP have been mentioned, so here's another one:

Intel Thread Building Blocks

I've never used it before, but I hear great things about it.

Hope this helps!

blwy10