views:

318

answers:

4

I have a long running function in MATLAB that I tried to speed up by adding caching and wound up slowing down my performance significantly. My code is basically searching for continuous "horizontal" lines in an edge detected image and the original code looks something like this:

function lineLength = getLineLength(img, startRow, startCol)
    [nRows, nCols] = size(img);
    lineLength = 0;
    if startRow < 1 || startRow > nRows
        return;
    end

    for curCol = startCol:nCols
        if img(curCol)
            lineLength = lineLength + 1;
            continue;
        elseif lineLength > 0
            lengths = zeros(2,1);
            lengths(1) = getLineLength(img, startRow - 1, curCol);
            lengths(2) = getLineLength(img, startRow + 1, curCol);
            increment = max(lengths);
            lineLength = lineLength + increment;
        end
        break; %// At this point the end of the current line has been reached
    end
end function

Since performance in this function is not what I would like, I thought I would add caching of the length from any point something like the following:

function lineLength = getLineLength(img, startRow, startCol)
persistent pointCache; 
    if startRow == 0 && startCol == 0
        pointCache = zeros(size(img, 1), size(img, 2), 2);
    end
    [nRows, nCols] = size(img);
    lineLength = 0;
    if startRow < 1 || startRow > nRows
        return;
    end

    for curCol = startCol:nCols
        if pointCache(startRow, curCol, 2)
            lineLength = lineLength + pointCache(startRow, curCol, 1);
            break;
        end
        if img(curCol)
            lineLength = lineLength + 1;
            continue;
        elseif lineLength > 0
            lengths = zeros(2,1);
            lengths(1) = getLineLength(img, startRow - 1, curCol);
            lengths(2) = getLineLength(img, startRow + 1, curCol);
            increment = max(lengths);
            lineLength = lineLength + increment;
        end
        break; %// At this point the end of the current line has been reached
    end
    pointCache(startRow, startCol, 1) = lineLength;
    pointCache(startRow, startCol, 2) = 1;
end function

What surprised me is that implementing this caching actually made my performance worse, rather than better. My best guesses are that either the global variable is getting me in trouble, or its the extra memory use, but I don't have enough MATLAB experience to know.

Edited...

As Gautam correctly pointed out that there was a bug in the original code that was ignoring the results of the recursion. This is what the actual code does. I'm sure it's obvious, but MATLAB is not my native language so if there's a more MATLABy way to do this, I'd love the suggestions.

A: 

My guess is that you're caching the wrong part of your code. The recursion in the elseif part seems to be the real bottleneck. The whole algorithm looks a bit strange to me, perhaps you should better try something like this (although I don't know for sure if this is what you want):

for every pixel p in img
  if (pixel p set)
    linelength = 1
    p2 = p
    while (pixel p2 set) and (p2 in same column as p)
      p++ // don't check lines twice
      p2++
      linelength++
    endwhile
  endif
schnaader
+2  A: 

I'm pretty sure the global is not the problem, but as a matter of style, you should use a persistent, which maintains it's value from call to call, but is local to the function.

Anytime you have performance problems, profile. Call profile on, then your function, then profile report. It will point out your real performance problems. Rarely is intuition good for profiling problems, especially in matlab. You can read the help, but it's pretty self-explanatory.

Marc
Thanks Marc, I didn't know about persistent. I much prefer having a more local scope and just didn't know how to achieve it.
Jon Norton
+2  A: 

It's not clear to me what the function does. In particular, why do you recursively call getLineLength and then effectively throw away the results (you only test if increment is greater than zero)?

My guess for why pointCache doesn't help: your function probably doesn't call itself repeatedly with the same paramters (startRow, startCol). Have you tried to log how many times getLineLength is called for a particular startRow and startCol?

Whatever your algorithm is, using recursion to iterate over a image is completely unsuited to MATLAB's strengths. If you want high performance:

  1. Set up your algorithm to use iteration instead of recursion, and
  2. Figure out how to vectorize the iterated parts.

Some tips on vectorizing:

  • Use built-in functions like sum, cumsum, diff, bsxfun, and accumarray to operate directly over the image matrix.
  • Complicated double-iteration computations on images can sometimes be reexpressed as matrix multiplications.
You're right, that was a bug because the actual code is on another machine w/o a connection to the internet so I was rewriting from memory. I did check my cache hit rate and it seemed like it should be helping. Now that you can see my actual intent is there a more MATLAB way to do it?
Jon Norton
A: 

As best I can tell, you're trying to find the number of non-zero elements for every column, though the code doesn't seem to quite accomplish that. Would something like the following work:

lineLengths = max(cumsum(img~=0, 1), 1)

If you're trying to extract blobs from your image, consider using the BWLABEL function.

I second what Gautam says about what typically works well in Matlab.

Mr Fooz