views:

891

answers:

4

Purely as an experiment, I'm writing sort functions in MATLAB then running these through the MATLAB profiler. The aspect I find most perplexing is to do with swapping elements.

I've found that the "official" way of swapping two elements in a matrix

self.Data([i1, i2]) = self.Data([i2, i1])

runs much slower than doing it in four lines of code:

e1 = self.Data(i1);
e2 = self.Data(i2);
self.Data(i1) = e2;
self.Data(i2) = e1;

The total length of time taken up by the second example is 12 times less than the single line of code in the first example.

Would somebody have an explanation as to why?

+4  A: 

In the first (slow) approach, the RHS value is a matrix, so I think MATLAB incurs a performance penalty in creating a new matrix to store the two elements. The second (fast) approach avoids this by working directly with the elements.

Check out the "Techniques for Improving Performance" article on MathWorks for ways to improve your MATLAB code.

Zach Scrivena
+1  A: 

you could also do:

tmp = self.Data(i1);
self.Data(i1) = self.Data(i2);
self.Data(i2) = tmp;
Jason S
+1  A: 

Zach is potentially right in that a temporary copy of the matrix may be made to perform the first operation, although I would hazard a guess that there is some internal optimization within MATLAB that attempts to avoid this. It may be a function of the version of MATLAB you are using. I tried both of your cases in version 7.1.0.246 (a couple years old) and only saw a speed difference of about 2-2.5.

It's possible that this may be an example of speed improvement by what's called "loop unrolling". When doing vector operations, at some level within the internal code there is likely a FOR loop which loops over the indices you are swapping. By performing the scalar operations in the second example, you are avoiding any overhead from loops. Note these two (somewhat silly) examples:

vec = [1 2 3 4];

%Example 1:
for i = 1:4,
  vec(i) = vec(i)+1;
end;

%Example 2:
vec(1) = vec(1)+1;
vec(2) = vec(2)+1;
vec(3) = vec(3)+1;
vec(4) = vec(4)+1;

Admittedly, it would be much easier to simply use vector operations like:

vec = vec+1;

but the examples above are for the purpose of illustration. When I repeat each example multiple times over and time them, Example 2 is actually somewhat faster than Example 1. For a small loop with a known number (in the example, just 4), it can actually be more efficient to forgo the loop. Of course, in this particular example, the vector operation given above is actually the fastest.

I usually follow this rule: Try a few different things, and pick the fastest for your specific problem.

gnovice
+4  A: 

Based on suggestions posted, I've run some more tests. It appears the performance hit comes when the same matrix is referenced in both the LHS and RHS of the assignment.

My theory is that MATLAB uses an internal reference-counting / copy-on-write mechanism, and this is causing the entire matrix to be copied internally when it's referenced on both sides. (This is a guess because I don't know the MATLAB internals).

Here are the results from calling the function 885548 times. (The difference here is times four, not times twelve as I originally posted. Each of the functions have the additional function-wrapping overhead, while in my initial post I just summed up the individual lines).

 swap1: 12.547 s
 swap2: 14.301 s
 swap3: 51.739 s

Here's the code:

 methods (Access = public)
     function swap(self, i1, i2)
        swap1(self, i1, i2);
        swap2(self, i1, i2);
        swap3(self, i1, i2);
        self.SwapCount = self.SwapCount + 1;
    end
 end

 methods (Access = private)
    %
    % swap1: stores values in temporary doubles
    %         This has the best performance
    %
    function swap1(self, i1, i2)
        e1 = self.Data(i1);
        e2 = self.Data(i2);
        self.Data(i1) = e2;
        self.Data(i2) = e1;
    end

    %
    % swap2: stores values in a temporary matrix
    %        Marginally slower than swap1
    %
    function swap2(self, i1, i2)
        m = self.Data([i1, i2]);
        self.Data([i2, i1]) = m;
    end

    %
    % swap3: does not use variables for storage.
    %        This has the worst performance
    %
    function swap3(self, i1, i2)
        self.Data([i1, i2]) = self.Data([i2, i1]);
    end


end
Andrew Shepherd