ansaurus

Question

fast threshold and bit packing algorithm ( possible improvements ? )

Answer 1

+2 A:

Try something like:

unsigned i, w8=w>>3, x;
for (i=0; i<w8; i++) {
    x = thres-src[0]>>1&0x80;
    x |= thres-src[1]>>2&0x40;
    x |= thres-src[2]>>3&0x20;
    x |= thres-src[3]>>4&0x10;
    x |= thres-src[4]>>5&0x08;
    x |= thres-src[5]>>6&0x04;
    x |= thres-src[6]>>7&0x02;
    x |= thres-src[7]>>8&0x01;
    out[i] = x;
    src += 8;
}

You can figure out the extra code for the remainder at the end of the line of the width is not a multiple of 8, or you could just pad/align the source to ensure it's a multiple of 8.

R.. 2010-09-14 01:49:08

Are you sure those shifts couldn't go from 0 to 7, rather than 1 to 8 (assuming that the threshold and src are both 8 bit values).

caf 2010-09-14 04:50:10

Yes, I chose the correct shift values. I'm shifting down bit 8, not bit 7, because I want the borrow bit from the integer result. Bit 7 could be 0 or 1 regardless of whether `thres-src[k]` wrapped modulo `UINT_MAX+1`.

R.. 2010-09-14 12:36:40

Answer 2

A:

You can do this with SSE quite easily, processing 16 pixels at a time, e.g.

load vector (16 x 8 bit unsigned)
add (255 - threshold) to each element
use PMOVMSKB to extract sign bits into 16 bit word
store 16 bit word

Example code using SSE intrinsics (wanring: untested !):

void threshold_and_pack(
    const uint8_t * in_image,       // input image, 16 byte aligned, height rows x width cols, width = multiple of 16
    uint8_t * out_image,            // output image, 2 byte aligned, height rows x width/8 cols, width = multiple of 2
    const uint8_t threshold,        // threshold
    const int width,
    const int height)
{
    const __m128i vThreshold = _mm_set1_epi8(255 - threshold);
    int i, j;

    for (i = 0; i < height; ++i)
    {
        const __m128i * p_in = (__m128i *)&in_image[i * width];
        uint16_t * p_out = (uint16_t *)&out_image[i * width / CHAR_BIT];

        for (j = 0; j < width; j += 16)
        {
            __m128i v = _mm_load_si128(p_in);
            uint16_t b;

            v = _mm_add_epi8(v, vThreshold);
            b = _mm_movemask_epi8(v);   // use PMOVMSKB to pack sign bits into 16 bit word

            *p_out = b;

            p_in++;
            p_out++;
        }
    }
}

Paul R 2010-09-14 10:55:45

ansaurus

tags:

views:

answers:

fast threshold and bit packing algorithm ( possible improvements ? )

related questions