views:

179

answers:

6

How to optimize this line drawing routine ? Will memcpy work faster ?

void ScreenDriver::HorizontalLine(int wXStart, int wXEnd, int wYPos,
    COLORVAL Color, int wWidth)
{
    int iLen = wXEnd - wXStart + 1;

    if (iLen <= 0)
    {
        return;
    }
    while(wWidth-- > 0)
    {
        COLORVAL *Put = mpScanPointers[wYPos] + wXStart;
        int iLen1 = iLen;

        while(iLen1--)
        {
            *Put++ = Color;
        }
        wYPos++;
    }
}
+1  A: 

No, not really. memcpy copies memory, that's a read and a write and you don't need the read. memset, which only writes, only writes bytes, so that isn't going to work either, unless COLORVAL is also a byte. No, leave it as is, the compiler should produce a fairly good bit of code. Don't forget that you are probably limited by memory bandwidth.

Skizz
A: 

I've found through personal experience that memcpy is slightly faster than direct pointer access... but only slightly, it isn't usually a ground-breaking optimization.

DeusAduro
+4  A: 

I think you mean to say "memset" instead of "memcpy". Replacing this bit of the code:

while (iLen--)
{
    *Put++ = Color;
}

with

memset(Put, Color, iLen);

could be faster but so much depends on your target CPU, memory architecture and the typical values of iLen encountered. It's not likely to be a big win, but if you've got the time I encourage you to measure the alternatives as that kind of exercise is the only way to really understand optimization.

Of course, this memset() use will only work if COLORVAL is character sized.

George Phillips
A: 

One of the fastest ways to draw a horizontal line, aka fill an array with a value, in assembly is to use the stosb, stosw, stosd instructions. memset is optimized to use stosb. To use dword values we can write code like the one below to draw a line,

__asm {
        cld
        mov eax, color
        mov ecx, screen_width
        mov edi, video_buffer
        rep stosd
}

But I'm quite sure that your inner while loop will be optimized by the compiler to use the stosd anyway.

Nick D
+1  A: 

Your best bet before doing anything else is to employ whatever low-level profiling tools you have available. At the very least get an overall timing for a hefty test case or 3. Without a baseline measurement you're shooting in the dark. (I should know, I'm as guilty of this as anyone else!)

That said I note that your code looks like it has a fair bit of overhead per pixel,

  1. A memset() call could be a win (if COLORVAL is sizeof(char) ).

  2. Alternately, unrolling the loop may help - this is heavily dependent on you input data, machine architecture etc.

  3. If your iLen value is reasonably bounded you might consider writing a custom function for each iLen value that is fully unrolled (inline the first few smallish cases in a switch) and call the bigger cases through an array of function pointers.

  4. The fastest option of course is usually to resort to assembly.

Andy J Buchanan
A: 

You could try unrolling the inner loop, but really it's only going to matter for lines close to horizontal.

For lines that are not close to horizontal it could be you spend more time setting up the table of scan pointers.

Frankly, for more realistic situations, where you have not only colors, but widths, line-styles and end-styles, not to mention drawing modes like XOR, and aliasing, the way I've seen it done is

  1. each "line" is really a polygon-fill, for which there are pretty fast algorithms (which is actually what your algorithm is), and/or

  2. a special-purpose machine-language routine is generated on-the-fly (stored on the stack) because there are too many options to have option-specific special routines, and you don't want the algorithm continually questioning pixel-by-pixel what the options are.

Mike Dunlavey