views:

125

answers:

5

Hello. I have a question regarding the speed of pointer dereferencing. I have a structure like so:

typedef struct _TD_RECT TD_RECT;
struct _TD_RECT {
  double left;
  double top;
  double right;
  double bottom;
};

My question is, which of these would be faster and why?


CASE 1:

TD_RECT *pRect;
...
for(i = 0; i < m; i++)
{
   if(p[i].x < pRect->left) ...
   if(p[i].x > pRect->right) ...
   if(p[i].y < pRect->top) ...
   if(p[i].y > pRect->bottom) ...
}

CASE 2:

TD_RECT *pRect;
double left = pRect->left;
double top = pRect->top;
double right = pRect->right;
double bottom = pRect->bottom;
...
for(i = 0; i < m; i++)
{
   if(p[i].x < left) ...
   if(p[i].x > right) ...
   if(p[i].y < top) ...
   if(p[i].y > bottom) ...
}

So in case 1, the loop is directly dereferencing the pRect pointer to obtain the comparison values. In case 2, new values were made on the function's local space (on the stack) and the values were copied from the pRect to the local variables. Through a loop there will be many comparisons.

In my mind, they would be equally slow, because the local variable is also a memory reference on the stack, but I'm not sure...

Also, would it be better to keep referencing p[] by index, or increment p by one element and dereference it directly without an index.

Any ideas? Thanks :)

+1  A: 

I think the second case is likely to be faster because you are not dereferencing the pointer to pRect on every loop iteration.

Practically, a compiler doing optimisation may notice this and there might be no difference in the code that is generated, but the possibility of pRect being an alias of an item in p[] could prevent this.

andrewmu
+7  A: 

You'll probably find it won't make a difference with modern compilers. Most of them would probably perform common subexpresion elimination of the expressions that don't change within the loop. It's not wise to assume that there's a simple one-to-one mapping between your C statements and assembly code. I've seen gcc pump out code that would put my assembler skills to shame.

But this is neither a C nor C++ question since the ISO standard doesn't mandate how it's done. The best way to check for sure is to generate the assembler code with something like gcc -S and examine the two cases in detail.

You'll also get more return on your investment if you steer away from this sort of micro-optimisation and concentrate more on the macro level, such as algorithm selection and such.

And, as with all optimisation questions, measure, don't guess! There are too many variables which can affect it, so you should be benchmarking different approaches in the target environment, and with realistic data.

paxdiablo
I am asking because the function I am writing is clipping polygons for vector maps that contain millions of vertices... any bit of speed I can squeeze out of it will help because I need to clip each section to 1 degree areas.
That's fine. The correct thing to do is to run real-world benchmarks since it depends on a large number of factors, very few of which we know. At a bare minimum, your target machine, the compiler, the CPU, other architectural issues like memory and I/O subsystems, the composition of your data, the optimisation level and so forth.
paxdiablo
That you have millions of them, look at my comment below. Create an index on your vector map (that is p?), sorted by x and by y. (Does it remain fairly static?). Or sort it on x and have an index on y. Use binary search to find all x < left, all x > right, all y < top, all y > bottom. So if you have say 4 million that is 22 comparisons for each, 88 total, instead of the 16 million you have now!
CashCow
well, each part of the vector map is structured itself in a specific area (ESRI shapefiles)... so I have the general area for each one to call it out quickly... but then I need to divide each of those sections into 1 degree square areas.
+1 for "*measure, don't guess!*" BTW, ESRI shapefiles? I feel your pain.
John Bode
yeah seriously... i don't understand why anyone would put two different byte orderings in one file specification
A: 

An optimizing compiler will see that the structure accesses are loop invariant and so do a Loop-invariant code motion, making your two cases look the same.

codaddict
+3  A: 

It is not likely to be a hugely performance critical difference. You could profile doing each option multiple times and see. Ensure you have your compiler optimisations set in the test.

With regards to storing the doubles, you might get some performance hit by using const. How big is your array?

With regards to using pointer arithmetic, this can be faster, yes.

You can instantly optimise if you know left < right in your rect (surely it must be). If x < left it can't also be > right so you can put in an "else".

Your big optimisation, if there is one, would come from not having to loop through all the items in your array and not have to perform 4 checks on all of them.

For example, if you indexed or sorted your array on x and y, you would be able, using binary search, to find all values that have x < left and loop through just those.

CashCow
A: 

I will be surprised if even a totally non-optimized compile (- O0) will produce differentcode for the two cases presented. In order to perform any operation on a modern processor, the data need to loaded into registers. So even when you declare automatic variables, these variables will not exist in main memory but rather in one of the processors floating point registers. This will be true even when you do not declare the variables yourself and therefore I expect no difference in generated machine code even for when you declare the temporary variables in your C++ code.

But as others have said, compile the code into assembler and see for yourself.

doron