ansaurus

Question

Answer 1

+2 A:

Unroll the inner loop with a lookup table.

There are four classes of byte:

00000001 - // Bytes ending with one or more 1's.  These start a run.
11111111 - // All 1's.  These continue a run.
10000000 - // Bytes starting with 1's but ending with 0's.  These end a run.
10111000 - // All the rest.  These can be enders or short runs.

Make a lookup table that lets you distinguish these. Then process the bit array one byte at a time.

edit

I'd like to be a little less vague about the contents of the lookup table. In specific, I'll suggest that you need three tables, each with 256 entries, for the following characteristics:

Number of bits set.
Number of bits set before first zero.
Number of bits set after last zero.

Depending on how you do it, you may not need the first.

Steven Sudit 2010-02-11 20:45:42

Also, check out: http://stackoverflow.com/questions/109023/best-algorithm-to-count-the-number-of-set-bits-in-a-32-bit-integer

Steven Sudit 2010-02-11 21:08:47

That still leaves a few things unanswered. And he specified that he's on a DSP, so memory access is expensive and ALU ops are cheap. # bits set is just population count (`bitc4` instruction), # before first zero is just leading zeroes (`lmbd` instruction), and # after last zero is just reversed leading zeroes (`bitr lmbd`)

Potatoswatter 2010-02-13 00:33:16

@Potatoswatter: I agree with your analysis. The solution I suggested is a reasonable one for a typical PC, but it's not the best answer for a DSP. In another comment, I already endorsed writing it in assembler.

Steven Sudit 2010-02-15 15:50:05

Answer 2

+3 A:

int nr = 0;    
for ( int i = 0; i < M; ++i )
{
  if ( bits[i] )
    ++nr;
  else
  {
    nr = 0; continue;
  }
  if ( nr == n ) return i - nr + 1; // start position    
}

What do you mean by brute force? O(M*N) or this O(M) solution? if you meant this, then I'm not sure how much more you can optimize things.

It's true we could achieve constant improvements by walking over every byte instead of every bit. This comes to mind: When I say byte I mean a sequence of N bits this time.

for ( int i = 0; i < M; i += N )
  if ( bits[i] == 0 ) // if the first bit of a byte is 0, that byte alone cannot be a solution. Neither can it be a solution in conjunction with the previous byte, so skip it.
    continue;
  else // if the first bit is 1, then either the current byte is a solution on its own or it is a solution in conjunction with the previous byte
  {
    // search the bits in the previous byte.
    int nrprev = 0;
    while ( i - nrprev >= 0 && bits[i - nrprev] ) ++nrprev;

    // search the bits in the current byte;
    int nrcurr = 0;
    while ( bits[i + nrcurr + 1] && nrcurr + nrprev <= N ) ++nrcurr;

    if ( nrcurr + nrprev >= N ) // solution starting at i - nrprev + 1.
      return i - nrprev + 1;   
  }

Not tested. Might need some additional conditions to ensure correctness, but the idea seems sound.

IVlad 2010-02-11 20:47:58

You're right that you can't get asymptotically faster than O(M) (since you will always have to look at all the bits in the worst case), but since we're talking bits here, it may be possible to achieve a constant-factor improvement over this algorithm by working byte-at-a-time instead of bit-at-a-time.

Tyler McHenry 2010-02-11 20:52:58

hey! This is what I was looking for.. And of course it can't be faster than O(M). perfect!

Nils Pipenbrinck 2010-02-11 20:58:49

Tyler McHenry is right about it being possible to heuristically optimized. I have edited my post to present such a method.

IVlad 2010-02-11 21:03:43

Start the first loop in the second code snippet at i = N-1. It's a small improvement, but every bit helps.

Justin Peel 2010-02-11 21:13:55

Would you expect this to be faster than a bytewise lookup solution?

Steven Sudit 2010-02-11 21:25:21

@Steven Sudit: I have no way of knowing. First of all, I am not sure in what way we could use any bitwise operators, because the OP said he has a bit array, which I understand to mean something like a C++ bitset. Second a lookup solution would use additional memory, which would be a disadvantage. Strictly speed-wise however, I couldn't say.

IVlad 2010-02-11 21:31:52

Well, if we don't have raw access to the underlying bytes, then we're going to be pretty slow, regardless. The nice thing about C++ is that you can always cast. As for the additional memory, we're talking about a table with 256 entries, each of which is a byte.

Steven Sudit 2010-02-11 22:18:52

Answer 3

+1 A:

I do something similar on an embedded device running on a MIPS core. The MIPS architecture includes the CLZ instruction ("Count Leading Zeroes") which will return the number of leading zero-bits for the specified register. If you need to count the leading one-bits, simply invert the data before calling CLZ.

Example, assuming you have a C-language function CLZ as an alias for the assembly instruction:

unsigned numbits = 0, totalbits = 0;
while (data != 0 && numbits != N) {
    numbits = CLZ(data);  // count leading zeroes
    data <<= numbits;     // shift off leading zeroes
    totalbits += numbits; // keep track of how many bits we've shifted off
    numbits = CLZ(~data); // count leading ones
    data <<= numbits;     // shift off leading ones
    totalbits += numbits; // keep track of how many bits we've shifted off
}

At the end of this loop, totalbits will indicate the offset (in bits, from the left) of the first run of N consecutive one-bits. Each line inside the loop can be represented in a single assembly instruction (except the fourth line, which requires a second for the invert operation).

Other non-MIPS architectures may have similar instructions available.

bta 2010-02-11 21:01:11

You don't need a CLZ opcode here: just make a lookup table that returns the right answers. Should be faster, too.

Steven Sudit 2010-02-11 21:06:43

Thanks, nice idea. I have CLZ (a bunch of them to be exact).. And I know of at least one MIPS that has CLZ as well (just nitpicking :-)

Nils Pipenbrinck 2010-02-11 21:16:51

Hey, if portability doesn't matter, code this in MIPS assembler and use all of the opcodes that optimize bit-twiddling. Can't beat that.

Steven Sudit 2010-02-11 21:18:20

Answer 4

+1 A:

Simple SWAR answer:

Given the value V you're inspecting, take N M-bit-wide registers. For all n in N, set register n to V >> n.

Dump bitwise AND(all N) into another M-wide register. Then simply find the bits set in that register and that will be the start of the an all-bits run.

I'm sure if you don't have an M-bit-wide registers you can adapt this to a smaller register size.

MSN 2010-02-11 21:11:13

Answer 5

+1 A:

This can be easily solved, and you don't need a count-zeroes instruction.

y = x ^ x-1

gives you a string of 1's up to the least-significant 1-bit in x.

y + 1

is the next individual bit which may be 1 or 0, and

x ^ x-(y+1)

gives you a string of 1's from that bit until the next 1-bit.

Then you can multiply the search pattern by (y+1) and recurse…

I'm working on an algorithm to fetch the strings… hold on…

Yeah… easily solved… while I'm working on that, note there's another trick. If you divide a word into substrings of n bits, then a series of ≥2n-1 1's must cover at least one substring. For simplicity, assume the substrings are 4 bits and words are 32 bits. You can check the substrings simultaneously to quickly filter the input:

const unsigned int word_starts = 0x11111111;
unsigned int word = whatever;
unsigned int flips = word + word_starts;
if ( carry bit from previous addition ) return true;
return ~ ( word ^ flips ) & word_starts;

This works because, after the addition operation, each bit (besides the first) in flips corresponding to a 1-bit in in word_starts equals (by the definition of binary addition)

word ^ carry_from_right ^ 1

and you can extract the carry bits by xoring with word again, negating, and ANDing. If no carry bits are set, a 1-string won't exist.

Unfortunately, you have to check the final carry bit, which C can't do but most processors can.

Potatoswatter 2010-02-12 03:17:49

Answer 6

+1 A:

If you're on an intel-compatible platform, the BSF (Bit Scan Forward) and BSR (Bit Scan Reverse) asm instructions could help you drop the first and last zero bits. This would be more efficient than the brute-force approach.

ggiroux 2010-02-12 03:41:39

Answer 7

+1 A:

This might be a bit over the top for what you are doing but I needed something heavyweight for a custom file system block allocation. If N < 32 then you can remove the second half of the the code.

For backward compatibility the most significant bit of the first word is regarded as bit 0.

Note that the algorithm uses a sentinel word (all zeros) at the end to stop any search rather than continually checking for end of array. Also note that the algorithm allows searching to start from any position in the bit array (typically the end of the last successful allocation) rather than always starting from the beginning of the bit array.

Supply your own compiler specific msbit32() function.

#define leftMask(x) (((int32_t)(0x80000000)) >> ((x) - 1))     // cast so that sign extended (arithmetic) shift used
#define rightMask(x) (1 << ((x) - 1))


/* Given a multi-word bitmap array find a run of consecutive set bits and clear them.
 *
 * Returns 0 if bitrun not found.
 *         1  if bitrun found, foundIndex contains the bit index of the first bit in the run (bit index 0 is the most significant bit of the word at lowest address).
 */

static int findBitRun(int runLen, uint32_t *pBegin, uint32_t *pStartMap, uint32_t *pEndMap, uint32_t *foundIndex)
{
    uint32_t *p = pBegin;
    unsigned int bit;

    if (runLen == 1)
    {    // optimise the simple & hopefully common case
        do {
            if (*p)
            {
                bit = msbit32(*p);
                *p &= ~(1 << bit);
                *foundIndex = ((p - pStartMap) * 32ul) + (31 - bit);
                return 1;
            }
            if (++p > pEndMap)
            {
                p = pStartMap;
            }
        } while (p != pBegin);
    }

    else if (runLen < 32)
    {
        uint32_t rmask = (1 << runLen) - 1;
        do {
            uint32_t map = *p;
            if (map)
            {
                // We want to find a run of at least runLen consecutive ones within the word.
                // We do this by ANDing each bit with the runLen-1 bits to the right
                // if there are any ones remaining then this word must have a suitable run.

                // The single bit case is handled above so can assume a minimum run of 2 required

                uint32_t w = map & (map << 1); // clobber any 1 bit followed by 0 bit
                int todo = runLen - 2;  // -2 as clobbered 1 bit and want to leave 1 bit

                if (todo > 2)
                {
                    w &= w << 2;      // clobber 2 bits
                    todo -= 2;

                    if (todo > 4)
                    {
                        w &= w << 4;      // clobber 4 bits
                        todo -= 4;
                        if (todo > 8)
                        {
                            w &= w << 8;      // clobber 8 bits
                            todo -= 8;
                        }
                    }
                }

                w &= w << todo;     // clobber any not accounted for

                if (w)              // had run >= runLen within word
                {
                    bit = msbit32(w); // must be start of left most run
                    *p &= ~(rmask << ((bit + 1) - runLen));
                    *foundIndex = ((p - pStartMap) * 32ul) + (31 - bit);
                    return 1;
                }
                else if ((map & 1) && (p[1] & 0x80000000ul))    // assumes sentinel at end of map
                {
                    // possibly have a run overlapping two words
                    // calculate number of bits at right of current word
                    int rbits = msbit32((map + 1) ^ map);
                    int lmask = rmask << ((32 + rbits) - runLen);
                    if ((p[1] | lmask) == p[1])
                    {
                        p[0] &= ~((1 << rbits) - 1);
                        p[1] &= ~lmask;
                        *foundIndex = ((p - pStartMap) * 32ul) + (32 - rbits);
                        return 1;
                    }
                }
            }
            if (++p > pEndMap)
            {
                p = pStartMap;
            }
        } while (p != pBegin);
    }
    else    // bit run spans multiple words
    {
        pEndMap -= (runLen - 1)/32;    // don't run off end
        if (pBegin > pEndMap)
        {
            pBegin = pStartMap;
        }

        do {
            if ((p[0] & 1) && ((p[0] | p[1]) == 0xfffffffful))   // may be first word of run
            {
                uint32_t map = *p;
                uint32_t *ps = p;      // set an anchor
                uint32_t bitsNeeded;
                int sbits;

                if (map == 0xfffffffful)
                {
                    if (runLen == 32)        // easy case
                    {
                        *ps = 0;
                        *foundIndex = (ps - pStartMap) * 32ul;
                        return 1;
                    }
                    sbits = 32;
                }
                else
                {
                    sbits = msbit32((map + 1) ^ map);
                }

                bitsNeeded = runLen - sbits;

                while (p[1] == 0xfffffffful)
                {
                    if (bitsNeeded <= 32)
                    {
                        p[1] = ~(0xfffffffful << (32 - bitsNeeded));
                        while (p != ps)
                        {
                            *p = 0;
                            --p;
                        }
                        *ps &= ~rightMask(sbits);
                        *foundIndex = ((p - pStartMap) * 32ul) + (32 - sbits);
                        return 1;
                    }
                    bitsNeeded -= 32;
                    if (++p == pBegin) 
                    {
                        ++pBegin;   // ensure we terminate
                    }
                }

                if ((bitsNeeded < 32) & (p[1] & 0x80000000ul))
                {
                    uint32_t lmask = leftMask(bitsNeeded);

                    if ((p[1] | lmask) == p[1])
                    {
                        p[1] &= ~lmask;
                        while (p != ps)
                        {
                            *p = 0;
                            --p;
                        }
                        *ps &= ~rightMask(sbits);
                        *foundIndex = ((p - pStartMap) * 32ul) + (32 - sbits);
                        return 1;
                    }
                }
            }

            if (++p > pEndMap)
            {
                p = pStartMap;
            }
        } while (p != pBegin);
    }

    return 0;
}

Dipstick 2010-02-12 05:24:26

BTW I should point out that I wrote this about 5 years ago and no longer have access to the archived source - this is the most recent version I could find lying about - it looks OK but I might have made a couple of minor tweaks later (probably to do with use of the sentinel).For gcc, msbit32(x) can be defined as (31 - __builtin_clz(x))

Dipstick 2010-02-12 07:16:38

Answer 8

+2 A:

Hacker's Delight, chapter 6-2.

kotlinski 2010-02-12 07:12:41

ansaurus

tags:

views:

answers:

find string of N 1-bits in a bit-array

related questions