views:

518

answers:

8

As the title sais I want to find a successive run of n one-bits in a bit-array of variable size (M).

The usual use-case is N <= 8 and M <= 128

I do this operation a lot in an innerloop on an embedded device. Writing a trivial implementation is easy but not fast enough for my taste (e.g. brute force search until a solution is found).

I wonder if anyone has a more elegant solution in his bag of tricks.

+2  A: 

Unroll the inner loop with a lookup table.

There are four classes of byte:

00000001 - // Bytes ending with one or more 1's.  These start a run.
11111111 - // All 1's.  These continue a run.
10000000 - // Bytes starting with 1's but ending with 0's.  These end a run.
10111000 - // All the rest.  These can be enders or short runs.

Make a lookup table that lets you distinguish these. Then process the bit array one byte at a time.

edit

I'd like to be a little less vague about the contents of the lookup table. In specific, I'll suggest that you need three tables, each with 256 entries, for the following characteristics:

Number of bits set.
Number of bits set before first zero.
Number of bits set after last zero.

Depending on how you do it, you may not need the first.

Steven Sudit
Also, check out: http://stackoverflow.com/questions/109023/best-algorithm-to-count-the-number-of-set-bits-in-a-32-bit-integer
Steven Sudit
That still leaves a few things unanswered. And he specified that he's on a DSP, so memory access is expensive and ALU ops are cheap. # bits set is just population count (`bitc4` instruction), # before first zero is just leading zeroes (`lmbd` instruction), and # after last zero is just reversed leading zeroes (`bitr lmbd`)
Potatoswatter
@Potatoswatter: I agree with your analysis. The solution I suggested is a reasonable one for a typical PC, but it's not the best answer for a DSP. In another comment, I already endorsed writing it in assembler.
Steven Sudit
+3  A: 
int nr = 0;    
for ( int i = 0; i < M; ++i )
{
  if ( bits[i] )
    ++nr;
  else
  {
    nr = 0; continue;
  }
  if ( nr == n ) return i - nr + 1; // start position    
}

What do you mean by brute force? O(M*N) or this O(M) solution? if you meant this, then I'm not sure how much more you can optimize things.

It's true we could achieve constant improvements by walking over every byte instead of every bit. This comes to mind: When I say byte I mean a sequence of N bits this time.

for ( int i = 0; i < M; i += N )
  if ( bits[i] == 0 ) // if the first bit of a byte is 0, that byte alone cannot be a solution. Neither can it be a solution in conjunction with the previous byte, so skip it.
    continue;
  else // if the first bit is 1, then either the current byte is a solution on its own or it is a solution in conjunction with the previous byte
  {
    // search the bits in the previous byte.
    int nrprev = 0;
    while ( i - nrprev >= 0 && bits[i - nrprev] ) ++nrprev;

    // search the bits in the current byte;
    int nrcurr = 0;
    while ( bits[i + nrcurr + 1] && nrcurr + nrprev <= N ) ++nrcurr;

    if ( nrcurr + nrprev >= N ) // solution starting at i - nrprev + 1.
      return i - nrprev + 1;   
  }

Not tested. Might need some additional conditions to ensure correctness, but the idea seems sound.

IVlad
You're right that you can't get asymptotically faster than O(M) (since you will always have to look at all the bits in the worst case), but since we're talking bits here, it may be possible to achieve a constant-factor improvement over this algorithm by working byte-at-a-time instead of bit-at-a-time.
Tyler McHenry
hey! This is what I was looking for.. And of course it can't be faster than O(M). perfect!
Nils Pipenbrinck
Tyler McHenry is right about it being possible to heuristically optimized. I have edited my post to present such a method.
IVlad
Start the first loop in the second code snippet at i = N-1. It's a small improvement, but every bit helps.
Justin Peel
Would you expect this to be faster than a bytewise lookup solution?
Steven Sudit
@Steven Sudit: I have no way of knowing. First of all, I am not sure in what way we could use any bitwise operators, because the OP said he has a bit array, which I understand to mean something like a C++ bitset. Second a lookup solution would use additional memory, which would be a disadvantage. Strictly speed-wise however, I couldn't say.
IVlad
Well, if we don't have raw access to the underlying bytes, then we're going to be pretty slow, regardless. The nice thing about C++ is that you can always cast. As for the additional memory, we're talking about a table with 256 entries, each of which is a byte.
Steven Sudit
+1  A: 

I do something similar on an embedded device running on a MIPS core. The MIPS architecture includes the CLZ instruction ("Count Leading Zeroes") which will return the number of leading zero-bits for the specified register. If you need to count the leading one-bits, simply invert the data before calling CLZ.

Example, assuming you have a C-language function CLZ as an alias for the assembly instruction:

unsigned numbits = 0, totalbits = 0;
while (data != 0 && numbits != N) {
    numbits = CLZ(data);  // count leading zeroes
    data <<= numbits;     // shift off leading zeroes
    totalbits += numbits; // keep track of how many bits we've shifted off
    numbits = CLZ(~data); // count leading ones
    data <<= numbits;     // shift off leading ones
    totalbits += numbits; // keep track of how many bits we've shifted off
}

At the end of this loop, totalbits will indicate the offset (in bits, from the left) of the first run of N consecutive one-bits. Each line inside the loop can be represented in a single assembly instruction (except the fourth line, which requires a second for the invert operation).

Other non-MIPS architectures may have similar instructions available.

bta
You don't need a CLZ opcode here: just make a lookup table that returns the right answers. Should be faster, too.
Steven Sudit
Thanks, nice idea. I have CLZ (a bunch of them to be exact).. And I know of at least one MIPS that has CLZ as well (just nitpicking :-)
Nils Pipenbrinck
Hey, if portability doesn't matter, code this in MIPS assembler and use all of the opcodes that optimize bit-twiddling. Can't beat that.
Steven Sudit
+1  A: 

Simple SWAR answer:

Given the value V you're inspecting, take N M-bit-wide registers. For all n in N, set register n to V >> n.

Dump bitwise AND(all N) into another M-wide register. Then simply find the bits set in that register and that will be the start of the an all-bits run.

I'm sure if you don't have an M-bit-wide registers you can adapt this to a smaller register size.

MSN
+1  A: 

This can be easily solved, and you don't need a count-zeroes instruction.

y = x ^ x-1

gives you a string of 1's up to the least-significant 1-bit in x.

y + 1

is the next individual bit which may be 1 or 0, and

x ^ x-(y+1)

gives you a string of 1's from that bit until the next 1-bit.

Then you can multiply the search pattern by (y+1) and recurse…

I'm working on an algorithm to fetch the strings… hold on…

Yeah… easily solved… while I'm working on that, note there's another trick. If you divide a word into substrings of n bits, then a series of ≥2n-1 1's must cover at least one substring. For simplicity, assume the substrings are 4 bits and words are 32 bits. You can check the substrings simultaneously to quickly filter the input:

const unsigned int word_starts = 0x11111111;
unsigned int word = whatever;
unsigned int flips = word + word_starts;
if ( carry bit from previous addition ) return true;
return ~ ( word ^ flips ) & word_starts;

This works because, after the addition operation, each bit (besides the first) in flips corresponding to a 1-bit in in word_starts equals (by the definition of binary addition)

word ^ carry_from_right ^ 1

and you can extract the carry bits by xoring with word again, negating, and ANDing. If no carry bits are set, a 1-string won't exist.

Unfortunately, you have to check the final carry bit, which C can't do but most processors can.

Potatoswatter
+1  A: 

If you're on an intel-compatible platform, the BSF (Bit Scan Forward) and BSR (Bit Scan Reverse) asm instructions could help you drop the first and last zero bits. This would be more efficient than the brute-force approach.

ggiroux
+1  A: 

This might be a bit over the top for what you are doing but I needed something heavyweight for a custom file system block allocation. If N < 32 then you can remove the second half of the the code.

For backward compatibility the most significant bit of the first word is regarded as bit 0.

Note that the algorithm uses a sentinel word (all zeros) at the end to stop any search rather than continually checking for end of array. Also note that the algorithm allows searching to start from any position in the bit array (typically the end of the last successful allocation) rather than always starting from the beginning of the bit array.

Supply your own compiler specific msbit32() function.

#define leftMask(x) (((int32_t)(0x80000000)) >> ((x) - 1))     // cast so that sign extended (arithmetic) shift used
#define rightMask(x) (1 << ((x) - 1))


/* Given a multi-word bitmap array find a run of consecutive set bits and clear them.
 *
 * Returns 0 if bitrun not found.
 *         1  if bitrun found, foundIndex contains the bit index of the first bit in the run (bit index 0 is the most significant bit of the word at lowest address).
 */

static int findBitRun(int runLen, uint32_t *pBegin, uint32_t *pStartMap, uint32_t *pEndMap, uint32_t *foundIndex)
{
    uint32_t *p = pBegin;
    unsigned int bit;

    if (runLen == 1)
    {    // optimise the simple & hopefully common case
        do {
            if (*p)
            {
                bit = msbit32(*p);
                *p &= ~(1 << bit);
                *foundIndex = ((p - pStartMap) * 32ul) + (31 - bit);
                return 1;
            }
            if (++p > pEndMap)
            {
                p = pStartMap;
            }
        } while (p != pBegin);
    }

    else if (runLen < 32)
    {
        uint32_t rmask = (1 << runLen) - 1;
        do {
            uint32_t map = *p;
            if (map)
            {
                // We want to find a run of at least runLen consecutive ones within the word.
                // We do this by ANDing each bit with the runLen-1 bits to the right
                // if there are any ones remaining then this word must have a suitable run.

                // The single bit case is handled above so can assume a minimum run of 2 required

                uint32_t w = map & (map << 1); // clobber any 1 bit followed by 0 bit
                int todo = runLen - 2;  // -2 as clobbered 1 bit and want to leave 1 bit

                if (todo > 2)
                {
                    w &= w << 2;      // clobber 2 bits
                    todo -= 2;

                    if (todo > 4)
                    {
                        w &= w << 4;      // clobber 4 bits
                        todo -= 4;
                        if (todo > 8)
                        {
                            w &= w << 8;      // clobber 8 bits
                            todo -= 8;
                        }
                    }
                }

                w &= w << todo;     // clobber any not accounted for

                if (w)              // had run >= runLen within word
                {
                    bit = msbit32(w); // must be start of left most run
                    *p &= ~(rmask << ((bit + 1) - runLen));
                    *foundIndex = ((p - pStartMap) * 32ul) + (31 - bit);
                    return 1;
                }
                else if ((map & 1) && (p[1] & 0x80000000ul))    // assumes sentinel at end of map
                {
                    // possibly have a run overlapping two words
                    // calculate number of bits at right of current word
                    int rbits = msbit32((map + 1) ^ map);
                    int lmask = rmask << ((32 + rbits) - runLen);
                    if ((p[1] | lmask) == p[1])
                    {
                        p[0] &= ~((1 << rbits) - 1);
                        p[1] &= ~lmask;
                        *foundIndex = ((p - pStartMap) * 32ul) + (32 - rbits);
                        return 1;
                    }
                }
            }
            if (++p > pEndMap)
            {
                p = pStartMap;
            }
        } while (p != pBegin);
    }
    else    // bit run spans multiple words
    {
        pEndMap -= (runLen - 1)/32;    // don't run off end
        if (pBegin > pEndMap)
        {
            pBegin = pStartMap;
        }

        do {
            if ((p[0] & 1) && ((p[0] | p[1]) == 0xfffffffful))   // may be first word of run
            {
                uint32_t map = *p;
                uint32_t *ps = p;      // set an anchor
                uint32_t bitsNeeded;
                int sbits;

                if (map == 0xfffffffful)
                {
                    if (runLen == 32)        // easy case
                    {
                        *ps = 0;
                        *foundIndex = (ps - pStartMap) * 32ul;
                        return 1;
                    }
                    sbits = 32;
                }
                else
                {
                    sbits = msbit32((map + 1) ^ map);
                }

                bitsNeeded = runLen - sbits;

                while (p[1] == 0xfffffffful)
                {
                    if (bitsNeeded <= 32)
                    {
                        p[1] = ~(0xfffffffful << (32 - bitsNeeded));
                        while (p != ps)
                        {
                            *p = 0;
                            --p;
                        }
                        *ps &= ~rightMask(sbits);
                        *foundIndex = ((p - pStartMap) * 32ul) + (32 - sbits);
                        return 1;
                    }
                    bitsNeeded -= 32;
                    if (++p == pBegin) 
                    {
                        ++pBegin;   // ensure we terminate
                    }
                }

                if ((bitsNeeded < 32) & (p[1] & 0x80000000ul))
                {
                    uint32_t lmask = leftMask(bitsNeeded);

                    if ((p[1] | lmask) == p[1])
                    {
                        p[1] &= ~lmask;
                        while (p != ps)
                        {
                            *p = 0;
                            --p;
                        }
                        *ps &= ~rightMask(sbits);
                        *foundIndex = ((p - pStartMap) * 32ul) + (32 - sbits);
                        return 1;
                    }
                }
            }

            if (++p > pEndMap)
            {
                p = pStartMap;
            }
        } while (p != pBegin);
    }

    return 0;
}
Dipstick
BTW I should point out that I wrote this about 5 years ago and no longer have access to the archived source - this is the most recent version I could find lying about - it looks OK but I might have made a couple of minor tweaks later (probably to do with use of the sentinel).For gcc, msbit32(x) can be defined as (31 - __builtin_clz(x))
Dipstick
+2  A: 

Hacker's Delight, chapter 6-2.

kotlinski