views:

179

answers:

1

Hey,

I have a dynamic programming algorithm for Knapsack in C++. When it was implemented as a function and accessing variables passed into it, it was taking 22 seconds to run on a particular instance. When I made it the member function of my class KnapsackInstance and had it use variables that were data members of that class, it started taking 37 seconds to run. As far as I know, only accessing member functions goes through the vtable so I'm at a loss to explain what might be happening.

Here's the code of the function

int KnapsackInstance::dpSolve() {
 int i; // Current item number
 int d; // Current weight
 int * tbl; // Array of size weightLeft
 int toret;
 tbl = new int[weightLeft+1];
 if (!tbl) return -1;
 memset(tbl, 0, (weightLeft+1)*sizeof(int));
 for (i = 1; i <= numItems; ++i) {
  for (d = weightLeft; d >= 0; --d) {
   if (profitsWeights.at(i-1).second <= d) {
    /* Either add this item or don't */
    int v1 = profitsWeights.at(i-1).first + tbl[d-profitsWeights.at(i-1).second];
    int v2 = tbl[d];
    tbl[d] = (v1 < v2 ? v2 : v1);
   }
  }
 }
 toret = tbl[weightLeft];
 delete[] tbl;
 return toret;
}

tbl is one column of the DP table. We start from the first column and go on until the last column. The profitsWeights variable is a vector of pairs, the first element of which is the profit and the second the weight. toret is the value to return.

Here is the code of the original function :-

int dpSolve(vector<pair<int, int> > profitsWeights, int weightLeft, int numItems) {
 int i; // Current item number
 int d; // Current weight
 int * tbl; // Array of size weightLeft
 int toret;
 tbl = new int[weightLeft+1];
 if (!tbl) return -1;
 memset(tbl, 0, (weightLeft+1)*sizeof(int));
 for (i = 1; i <= numItems; ++i) {
  for (d = weightLeft; d >= 0; --d) {
   if (profitsWeights.at(i-1).second <= d) {
    /* Either add this item or don't */
    int v1 = profitsWeights.at(i-1).first + tbl[d-profitsWeights.at(i-1).second];
    int v2 = tbl[d];
    tbl[d] = (v1 < v2 ? v2 : v1);
   }
  }
 }
 toret = tbl[weightLeft];
 delete[] tbl;
 return toret;
}

This was run on Debian Lenny with g++-4.3.2 and -O3 -DNDEBUG turned on

Thanks

+3  A: 

In a typical implementation, a member function receives a pointer to the instance data as a hidden parameter (this). As such, access to member data is normally via a pointer, which may account for the slow-down you're seeing.

On the other hand, it's hard to do more than guess with only one version of the code to look at.

After looking at both pieces of code, I think I'd write the member function more like this:

int KnapsackInstance::dpSolve() {
    std::vector<int> tbl(weightLeft+1, 0);
    std::vector<pair<int, int> > weights(profitWeights);
    int v1;

    for (int i = 0; i <numItems; ++i) 
        for (int d = weightLeft; d >= 0; --d)
            if ((weights[i+1].second <= d) && 
                ((v1 = weights[i].first + tbl[d-weights[i-1].second])>tbl[d]))
                    tbl[d] = v1;
    return tbl[weightLeft];
}
Jerry Coffin
While you are correct about (this) being passed implicitly, this shouldn't matter after the first pass of a loop. Think about it: once it's accessed once, it should be cached. Dereferencing the pointer can be done in 2 instructions on many processors: one to load the pointer, one to load the data in index-deferred mode. Comparatively, loading a variable normally would incur only one instruction. Any overhead would not be noticeable, especially considering the aggressive nature of compiler optimizations and hardware-level instruction reordering.
San Jacinto
@San Jacinto:You might be right -- but then again, perhaps not. As implied in my post, at the time there was only one version of the code in the question, so all I could go from was "one time it's a member function, the other it's not", and about the only difference in that case is access via `this`. I can see possibilities other than just pointer dereferencing as causing a bottleneck though, such as all the access via `this` preventing some parallel execution. Tough to guess about though...
Jerry Coffin
I made a local alias to profitsWeights vector and the time went back to normal. Thanks
Sid