views:

1645

answers:

3

I was reading this article (note: click the magnifying glass to zoom to be able to read it), and this guy goes on talking about how everyone can greatly benefit from mixing in data oriented design with OOP. He doesn't show any code samples, however.

I googled this and couldn't find any real information as to what this is, let alone any code samples. Is anyone familiar with this term and can provide an example? Is this maybe a different word for something else?

+2  A: 

A data oriented design is a design in which the logic of the application is built up of data sets, instead of procedural algorithms. For example

procedural approach.

int animation; // this value is the animation index

if(animation == 0)
   PerformMoveForward();
else if(animation == 1)
  PerformMoveBack();
.... // etc

data design approach

typedef struct
{
   int Index;
   void (*Perform)();
}AnimationIndice;

// build my animation dictionary
AnimationIndice AnimationIndices[] = 
  {
      { 0,PerformMoveForward }
      { 1,PerformMoveBack }
  }

// when its time to run, i use my dictionary to find my logic
int animation; // this value is the animation index
AnimationIndices[animation].Perform();

Data designs like this promote the usage of data to build the logic of the application. Its easier to manage especially in video games which might have thousands of logic paths based on animation or some other factor.

Andrew Keith
This is actually not correct. You are confusing data oriented design with data driven design. I did the same thing until I read Noel's article and realized he was talking about something entirely different.
Adam Smith
+11  A: 

First of all don't confuse this with data driven design.

My understanding of Data Oriented Design is that it is about organizing your data for efficient processing. Especially with respect to cache misses etc. Data Driven Design on the other hand is about letting data control a lot of your programs behavior (described very well by Andrew Keith above).

Say you have ball objects in your application with properties such as color, radius, bounciness, position etc. In OOP you would describe you balls like this:

class Ball {
  Point  pos;
  Color  color;
  double radius;

  void draw();
};

And then you would create a collection of balls like this:

vector<Ball> balls;

In Data Oriented Design however you are more likely to write the code like this:

class Balls {
  vector<Point>  pos;
  vector<Color>  color;
  vector<double> radius;

  void draw();
};

As you can see there is no single unit representing one Ball anymore. Ball objects only exist implicitly. I don't want to rewrite the article so I am not going to go into detail why one does it like this, but it can have many advantages performance wise. Usually we want to do operations on many balls at the same time. Hardware usually want large continuous chunks of memory to operate efficiently. Secondly you might do operations that affects only part of a balls properties. E.g. if you combine the colors of all the balls in various ways, then you want your cache to only contain color information. However when all ball properties are stored in one unit you will pull in all the other properties of a ball as well. Even though you don't need them.

Say a ball each ball takes up 64 bytes and a Point takes 4 bytes. A cache slot takes say 64 bytes as well. If I want to update the position of 10 balls I have to pull in 10*64 = 640 bytes of memory into cache and get 10 cache misses. If however I can work the positions of the balls as separate units, that will only take 4*10 = 40 bytes. That fits in one cache fetch. Thus we only get 1 cache miss to update all the 10 balls. These numbers are arbitrary I assume a cache block is bigger.

But it illustrates how memory layout can have severe effect cache hits and thus performance. This will only increase in importance as the difference between CPU and RAM speed widens.

Adam Smith
Thanks for this, you explained it very well.
ryeguy
well said; I've got only one question though. Let's say we have a structure `struct balls {vector<vec3> pos; vector<vec3> velocity;}`, wouldn't updating the position of each ball actually thrash the cache since you'd move back and forth between the velocity vector and the position vector (yes modern machines and cache-lines and all that, this is also just an illustration)?
roe
It might. But remember the whole pos array will not be pulled in at a time. Just one cache line, and possible some prefetching. Likewise with velocity. So for them to trash each other each corresponding chunk of pos and vector have to map to the same cacheline. That can of course happen, which is why the recommendation is to put variables that are used together together in a struct. So e.g. velocity and pos would be in one vector while color would be in another vector.
Adam Smith
+2  A: 

I just want to point out that Noel is talking specifically about some of the specific needs we face in game development. I suppose other sectors that are doing real-time soft simulation would benefit from this, but it is unlikely to be a technique that will show noticeable improvement to general business applications. This set up is for ensuring that every last bit of performance is squeezed out of the underlying hardware.

bill c