views:

114

answers:

5

Consider the following situation:

class MyFoo {
public:
  MyFoo();
  ~MyFoo();
  void doSomething(void);
private:
  unsigned short things[10]; 
};

class MyBar {
public:
  MyBar(unsigned short* globalThings);
  ~MyBar();
   void doSomething(void);
private:
  unsigned short* things;
};

MyFoo::MyFoo() {
  int i;
  for (i=0;i<10;i++) this->things[i] = i;
};

MyBar::MyBar(unsigned short* globalThings) {
  this->things = globalThings;
};

void MyFoo::doSomething() {
  int i, j;
  j = 0;
  for (i = 0; i<10; i++) j += this->things[i];
};

void MyBar::doSomething() {
  int i, j;
  j = 0;
  for (i = 0; i<10; i++) j += this->things[i];
};


int main(int argc, char argv[]) {
  unsigned short gt[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

  MyFoo* mf = new MyFoo();
  MyBar* mb = new MyBar(gt);

  mf->doSomething();
  mb->doSomething();
}

Is there an a priori reason to believe that mf.doSomething() will run faster than mb.doSomething()? Does that change if the executable is 100MB?

+2  A: 

There's little reason to believe one will be noticeably faster than the other. If gt (for example) was large enough for it to matter, you might get slightly better performance from:

int j = std::accumulate(gt, gt+10, 0);

With only 10 elements, however, a measurable difference seems quite unlikely.

Jerry Coffin
+2  A: 

Because anything can modify your gt array, there may be some optimizations performed on MyFoo that are unavaible to MyBar (though, in this particular example, I don't see any)

Since gt lives locally (we used to call that the DATA segment, but I'm not sure if that still applies), while things lives in the heap (along with mf, and the other parts of mb) there may be some memory access & caching issues dealing with things. But, if you'd created mf locally (MyFoo mf = MyFoo()), then that would be an issue (i.e. things and gf would be on an equal footing in that regard)

The size of the executable should make any difference. The size of the data might, but for the most part, after the first access, both arrays will be in the CPU cache and there should be no difference.

James Curran
Thanks, this was my roundabout way of trying to figure out if making data local to the class instance was meaningfully 'faster' than data in the global sense. I think you answered it by saying, "Probably not," which is good enough for me.
David
Actually, what I was going for was "It's *way* more complicated then that....."
James Curran
+1  A: 

Most likely the extra dereference (of MyBar, which has to fetch the value of the member pointer) is meaningless performance-wise, especially if the data array is very large.

adamk
+2  A: 

MyFoo::DoSomething can be expected to be marginally faster than MyBar::DoSomething This is because when things is stored locally in an array, we just need to dereference this to get to things and we can access the array immediately. When things is stored externally, we first need to dereference this and then we need to dereference things before we can access the array. So we have two load instructions.

I have compiled your source into assembler (using -O0) and the loop for MyFoo::DoSomething looks like:

    jmp .L14
.L15:
    movl    -4(%ebp), %edx 
    movl    8(%ebp), %eax //Load this into %eax
    movzwl  (%eax,%edx,2), %eax //Load this->things[i] into %eax
    movzwl  %ax, %eax
    addl    %eax, -8(%ebp)
    addl    $1, -4(%ebp)
.L14:
    cmpl    $9, -4(%ebp)
    setle   %al
    testb   %al, %al
    jne .L15

Now for DoSomething::Bar we have:

    jmp .L18
.L19:
    movl    8(%ebp), %eax //Load this
    movl    (%eax), %eax //Load this->things
    movl    -4(%ebp), %edx
    addl    %edx, %edx
    addl    %edx, %eax
    movzwl  (%eax), %eax //Load this->things[i]
    movzwl  %ax, %eax
    addl    %eax, -8(%ebp)
    addl    $1, -4(%ebp)
.L18:
    cmpl    $9, -4(%ebp)
    setle   %al
    testb   %al, %al
    jne .L19

As can be seen from the above there is the double load. The problem may be compounded if this and this->things have a large difference in address. This they will then live in different cache pages and the CPU may have to do two pulls from main memory before this->things can be accessed. When they are part of the same object, when we get this we get this->things at the same time as this.

Caveate - the optimizer may be able to provide some shortcuts that I have not thought of though.

doron
I was about to say: With -O0 is meaningless.
DeadMG
The problem is that since Foo::doSomething and Bar::doSomething just modify local variable, the compiler spots things and totally optimizes the function to do nothing.In the more general case, I think I have some grounds to say that a double dereference is slower than a single dereference that you have when things is stored locally.
doron
A: 

It could be somewhat slower. The question is simply how often you access. What you should consider is that your machine has a fixed cache. When MyFoo is loaded in to have DoSomething called on it, the processor can just load the whole array into cache and read it. However, in MyBar, the processor first must load the pointer, then load the address it points to. Of course, in your example main, they're all probably in the same cache line or close enough anyway, and for a larger array, the number of loads won't increase substantially with that one extra dereference.

However, in general, this effect is far from ignorable. When you consider dereferencing a pointer, that cost is pretty much zero compared to actually loading the memory it points to. If the pointer points to some already-loaded memory, then the difference is negligible. If it doesn't, you have a cache miss, which is very bad and expensive. In addition, the pointer introduces issues of aliasing, which basically means that your compiler can perform much less optimistic optimizations on it.

Allocate within-object whenever possible.

DeadMG