views:

830

answers:

5

I'm trying to use parallelization to improve the refresh rate for drawing a 3D scene with heirarchically ordered objects. The scene drawing algorithm first recursively traverses the tree of objects, and from that, builds an ordered array of essential data that is needed to draw the scene. Then it traverses that array multiple times to draw objects/overlays, etc. Since from what I've read OpenGL isn't a thread-safe API, I assume the array traversal/drawing code must be done on the main thread, but I'm thinking that I might be able to parallelize the recursive function that fills the array. The key catch is that the array must be populated in the order that the objects occur in the scene, so all functionality that associates a given object with an array index must be done in the proper order, but once the array index has been assigned, I can fill the data of that array element (which isn't necessarily a trivial operation) using worker threads. So here's the pseudo code that I'm trying to get at. I hope you get the idea of the xml-ish thread syntax.

recursivepopulatearray(theobject)
{
  <main thread>
  for each child of theobject
  {
     assign array index
     <child thread(s)>
       populate array element for child object
     </child thread(s)>
     recursivepopulatearray(childobject)
  }
  </main thread>
}

So, is it possible to do this using OpenMP, and if so, how? Are there other parallelization libraries that would handle this better?

Addendum: In response to Davide's request for more clarification, let me explain a little more in detail. Let's say that the scene is ordered like this:

-Bicycle Frame
  - Handle Bars 
  - Front Wheel
  - Back Wheel
-Car Frame
  - Front Left Wheel
  - Front Right Wheel
  - Back Left Wheel
  - Back Right Wheel

Now, each of these objects has lots of data associated with it, i.e. location, rotation, size, different drawing parameters, etc. Additionally, I need to make multiple passes over this scene to draw it properly. One pass draws the shapes of the objects, another pass draws text describing the objects, another pass draws connections/associations between the objects if there are any. Anyway, getting all the drawing data out of these different objects is pretty slow if I have to access it multiple times, so I've decided to use one pass to cache all that data into a one-dimensional array, and then all the actual drawing passes just look at the array. The catch is that because I need to do OpenGL pushes/pops in the right order, the array must be in the proper depth-first search order that is representative of the tree heirarchy. In the example above, the array must be ordered as follows:

index 0: Bicycle Frame
index 1: Handle Bars 
index 2: Front Wheel
index 3: Back Wheel
index 4: Car Frame
index 5: Front Left Wheel
index 6: Front Right Wheel
index 7: Back Left Wheel
index 8: Back Right Wheel

So, the ordering of the array must be serialized properly, but once I have assigned that ordering properly, I can parallelize the filling of the array. For example once I've assigned Bicycle Frame to index 0 and Handle Bars to index 1, one thread can take the filling of the array element for the Bicycle Frame while another takes the the filling of the array element for Handle Bars.

OK, I think in clarifying this, I've answered my own question, so thanks Davide. So I've posted my own answer.

A: 

to parallelise the child thread, simply put a pragma before the loop:

#pragma omp parallel for
for (i=0; i < elements; i++) 
{
}

Job done.

Now, you're quite right you cannot get any threading library to do one bit before another in a fully parallel way (obviously!), and openMP doesn't have a 'lock' or 'wait' feature (it does have a 'wait for all to finish' keyword - Barrier), its not designed to emulate a thread library, but it does allow you to store values "outside" the parallel section, and to mark certain sections as 'single threaded only' (Ordered keyword) so this may help you to assign the indexes in a parallel loop while other threads are assigning elements.

Take a look at a getting started guide.

If you're using Visual C++, you'll also need to set the /omp flag in your compiler build settings.

gbjbaanb
+1  A: 

I think you should clarify better your question (e.g. what exactly must be done serially and why)

OpenMP (like many other parallelization libraries) does not guarantee the order in which the various parallel sections will be executed, and since they are truly parallel (on a multicore machine) there might be race conditions if different sections write the same data. If that's ok for your problem, surely you can use it.

Davide
Davide, thanks for making me think through the process a little more. In editing my question and thinking it through more rigorously, I figured out a sufficient answer.
Anthony Johnson
+1  A: 

As gbjbaanb mentioned, you can do this easily - it just requires a pragma statement to parallelize this.

However, there are a few things to watch out for:

First, you mention that order is crutial here. If you need to preserve ordering in flattening a hierarchical structure, parallelizing (at this level) is going to be problematic. You're likely going to completely lose your ordering.

Also, parallelizing recursive functions has many problems. Take an extreme case - say you have a dual core machine, and you have a tree where each "parent" node has 4 children. If the tree is deep, you very, very quickly "over-parallelize" the problem, typically making things worse, not better, performance wise.

If you're going to do this, you should probably put a level parameter, and only parallelize the first couple of levels. Take my 4 child-per-parent example, if you parallelize the first 2 levels, you already are breaking this into 16 parallel chunks (called from 4 parallel chunks).

From what you mentioned, I'd leave this portion serial, and focus instead of the second where you mention:

"Then it traverses that array multiple times to draw objects/overlays, etc."

That sounds like an ideal place to parallelize.

Reed Copsey
Reed, I agree that traversing a one dimensional array is much more easy to parallelize than a recursive tree search, but since OpenGL is not thread-safe, the actual drawing part has to be done in serial. However, I think I've got a valid solution where I can do a minimalist recursive algorithm to make the array index associations in serial, and then do the filling of the array parallel.
Anthony Johnson
A: 

Here's a modified piece of pseudo-code that should work.

populatearray(thescene)
{
  recursivepopulatearray(thescene)

  #pragma omp parallel for
  for each element in array
    populate array element based on associated object
}

recursivepopulatearray(theobject)
{
  for each childobject in theobject
  {
     assign array index and associate element with childobject
     recursivepopulatearray(childobject)
  }
}
Anthony Johnson
A: 

Questions about OpenMP programming can get answered by the experts over at the OpenMP Forum at openmp.org, the official OpenMP website.