tags:

views:

144

answers:

5

Is it faster to access a global or object variable?

In C++, I'm referring to the difference between

::foo

and

this->foo

In x86 assembler, this basically translates to

mov eax, offset foo

vs

mov eax, dword ptr[edx+foo]

All data in both cases is expected to be in cache.

(I know the difference if any will be tiny, and one should usually go with whichever makes the code simpler; but in this case there is literally no other difference, and the code in question will be called maybe half a billion times under a time limit, so I might as well go with whichever is even slightly faster.)

+1  A: 

As the previous commentators said, measure it. Comparing the assembler instructions won't help you. Predicting the behavior of your computers cpu caching is near impossible and depends on other needed data. Also, your program is not necessarily cpu-bound.

You might want to use placement new to assure that the object holding foo is in a convenient place in memore to avoid page faults.

Gabriel Schreiber
+9  A: 

You need to test and time both.

However, do this knowing that you've made other decisions in your app that will have a greater performance impact by several orders of magnitude than this.

To human eyes the Global is faster to access, however what the compiler decides to put where, and how the processor decides to cache things will ultimately decide which is faster.

Test it and time it. I'd be stunned if you got meaningful differences in a non trivial app over millions of runs.

Binary Worrier
+4  A: 

Please don't optimize this for speed.

There is an important semantic difference between the two, and the value it brings your code, with the data in the place that makes the most logical sense, is going to save you more time than reducing the run time performance.

Several billion iterations isn't really all that much. The CPU in my computer runs at a blazing 2.2Ghz. If it's in cache, the dereference is going to cost maybe an extra cycle, and so 100billion loops is about 30 seconds of run-time. I doubt i'll miss it.

TokenMacGuy
+3  A: 

Have you considered a third option?

void bar()
{
   foo_t myFoo = ::foo; // or this->foo
   for(;;)
   {
       // do something with myFoo
   }
   ::foo = myFoo;
}

In this case, the compiler is likely to put foo in a register, which is sure to be even faster than a cache access.

TokenMacGuy
Compilers aren't that bad in assigning registers. In fact, I don't know any compiler anymore that respects the `register` keyword, as they're all confident that they don't need the hint at all.
MSalters
Compilers are more prone to using registers for local variables than variables that must have an address, such as a global variable (which exists in the executable image) or explicitly dereferenced values (such as `this->foo`)
TokenMacGuy
+3  A: 

As always, go with what makes the code simpler.

If you use a global, then the reader of the code has to wonder why, and where else is this variable accessed from. How many threads are it accessed from? How are accesses from different threads synchronized?

If you make a local variable that is only visible where it's needed, then those questions go away.

Speed-wise, the only thing that might make a difference is cache locality. If the variable is accessed often, it'll get cached in both cases, but if it is located next to other recently used objects, they'll be able to share the same cache line, leaving more room free in the cache for other data.

But if the code is worth optimizing, then it's also worth measuring.

Avoiding globals is the simple, clean option. If performance is a problem, and your measurements indicate that using a global is faster, then switch to a global.

But bear in mind that you're also changing the semantics of your program. If you have multiple threads calling the function, you'll get a race condition if you use a global, where it was safe before

jalf
+1 for "if the code is worth optimizing, then it's also worth measuring". Note that some ("low-performance") processors might see a speed difference. More importantly, a relative load with a small offset will probably be smaller (code-wise) than an absolute load; I generally prefer the version which takes up less space in your I-cache.
tc.