views:

1316

answers:

12

I need to scan for a 16 bit word in a bit stream. It is not guaranteed to be aligned on byte or word boundaries.

What is the fastest way of achieving this? There are various brute force methods; using tables and/or shifts but are there any "bit twiddling shortcuts" that can cut down the number of calculations by giving yes/no/maybe contains the flag results for each byte or word as it arrives?

C code, intrinsics, x86 machine code would all be interesting.

+9  A: 

My money's on Knuth-Morris-Pratt with an alphabet of two characters.

Kinopiko
That seems to be for byte aligned characters in strings?
Stuart
The "alphabet of two characters" means that you turn the bytestream into a bitstream.
MSalters
the word knuth automatcially add's points to the answer. ;^) I think their algorithm has nothing to do with the bit matching.
Toad
This performs poorly, as KMP relies on skipping forward. However, when searching for 10..... you can't skip anything. You have a mismatch when you encounter 11......, yet the string could be 110.... so must check the second 1 in the search string against the first 1 of your pattern. It does work well if your pattern is e.g. 11111.....; whenever you encounter a 0 you can skip 5 ahead. KMP works best if the alphabet is large, e.g. Unicode. A two-character alphabet is the worst case for KMP (but more search algorithms suffer from small alphabets)
MSalters
This is overkill for his problem because the KMP algorithm works for arbitrarily long words, where his word has a fixed length of 16 bits. KMP is too flexible for this problem and will not be as efficient as simpler approaches.
A. Levy
Answering the question exactly requires a lot of hard work, so I'll just call this a bet rather than an answer.
Kinopiko
The runtime overhead of pulling out one bit at a time would be comparable to the entire workload in a solution like reinier's and you haven't even started the KMP work at that point. On top of that, an alphabet of only two characters and a short, fixed-length search pattern is the worst case for KMP. Remember that there's more to life than big-O analysis.
Alan
+1  A: 

Seems like a good use for SIMD instructions. SSE2 added a bunch of integer instructions for crunching multiple integers at the same time, but I can't imagine many solutions for this that don't involve a lot of bit shifts since your data isn't going to be aligned. This actually sounds like something an FPGA should be doing.

ajs410
Emulating what an FPGA does in software may be an interesting place to start.
Stuart
All the FPGA would do is shift bits and compare, it's just that a circuit doesn't care about byte alignment, and it can shift the next bit in while it's comparing against the current bit.
ajs410
+1  A: 

Maybe you should stream in your bit stream in a vector (vec_str), stream in your pattern in another vector (vec_pattern) and then do something like the algorithm below

i=0
while i<vec_pattern.length
    j=0
    while j<vec_str.length
            if (vec_str[j] xor vec_pattern[i])
                i=0
                j++

(hope the algorithm is correct)

fritzone
using anything like classes on the bitlevel makes it terribly slow. Bitstreams will be no exception, especially if you are going to compare very bit separately.
Toad
I wouldn't bet against this without testing it. With C++ template expansion, this might well get optimized into something surprisingly fast.
Mark Bessey
mark: my experience is: always spell it out to the compiler and don't trust it's optimization magic too much in case of going for the optimal cycles. human brain beats compilers anytime.
Toad
+5  A: 

I would implement a state machine with 16 states.

Each state represents how many received bits conform to the pattern. If the next received bit conform to the next bit of the pattern, the machine steps to the next state. If this is not the case, the machine steps back to the first state (or to another state if the beginning of the pattern can be matched with a smaller number of received bits).

When the machine reaches the last state, this indicates that the pattern has been identified in the bit stream.

mouviciel
this would be dead slow!
Toad
This would be as fast as the input bit stream.
mouviciel
Probably not as fast as the versions that try to match multiple versions of the pattern at once, but plenty fast for any practical application.
Mark Bessey
This would likely be pretty fast an have the other advantage of being dead simple so less chance for boundary condition bugs. The only tricky part is handling partial matches - you have to reset the state 'back somewhere in the stream'. The simple method of just backing up the data stream could cause things to be slow for pathological streams (or patterns) where you get a lot of 15 bit matches. It's possible to build into the state machine the right state to restart matching without having to re-check bits, but that would make the state machine trickier to build (I think).
Michael Burr
This is tricky in the general case. But if the pattern is a constant of the problem, the state reset can be hardcoded. Or you can put the tricky part at initialisation step with a state-machine generation function taking the pattern as argument.
mouviciel
+16  A: 

I think precalc all shifted values of the word and put them in 16 ints so you got an array like this

 unsigned short pattern = 1234;
 unsigned int preShifts[16];
 unsigned int masks[16];
 int i;
 for(i=0; i<16; i++)
 {
      preShifts[i] = (unsigned int)(pattern<<i);  //gets promoted to int
      masks[16] = (unsigned int) (0xffff<<i);
 }

and then for every unsigned short you get out of the stream, make an int of that short and the previous short and compare that unsigned int to the 16 unsigned int's. If any of them match, you got one.

so basically like this:

  int numMatch(unsigned short curWord, unsigned short prevWord)
  {
       int numHits = 0;
       int combinedWords = (prevWord<<16) + curWord;

       int i=0;
       for(i=0; i<16; i++)
       {
             if((combinedWords & masks[i]) == preShifsts[i]) numHits++;
       }
       return numHits;
  }

edit: do note that this could potentially mean multiple hits when the patterns is detected more than once on the same bits:

e.g. 32 bits of 0's and the pattern you want to detect is 16 0's, then it would mean the pattern is detected 16 times!

Toad
This is more along the lines of what I was thinking and quite possibly as good as it will get, especially if SIMD intrinsic will allow multiple comparisons across multiple word. The "flag" is non-zero and unique within the bit stream.
Stuart
Cool. If you use SIMD you could potentially search even faster by putting the pattern a few times 'next to eachother' in the huge registers. This way you can compare something like 4 unsigned int's at the same time.
Toad
Exactly. Of course I need to learn a bit more about SIMD.
Stuart
I think the general idea is workable, but the details need some refinement. A pattern might span up to 3 bytes so the 'match' table would have to contain items that can hold 24 bit values (probably unsigned ints); you'd need to represent all of the possible values in the 'don't care' bits, so you'd need 16*256=4096 items your match table. A binary search might be in order. Finally you have to build the 3 byte value to look up on a byte-by-byte basis, not on a short-by-short basis. On the whole, I think that mouviciel's state machine approach would be simpler and possibly faster.
Michael Burr
michael: I don't think you understand the logic completely. The 16bit pattern is shifted 16 times into 16 unsigned ints creating every possible variation of the 16bit word. By getting 16 bits a time and masking this and checking it against the 16 possible patterns you have done all the checks you need.
Toad
@reinier - I see - you're right. But the point about working byte-by-byte through the input stream stands (I think). Otherwise you'd miss the situation where a pattern spanned a short boundary in the input stream.
Michael Burr
@michael: No it won't miss it since I'm checking unsigned int's at a time. So it doesn't matter where the pattern of 16bits is shifted inside this 32 bit unsigned int
Toad
Another slightly tricky thing that needs to be handled (related to getting multiple matches) is to handle the 'remaining' bits in the stream after a match (assuming that you're interested in continuing the search for further matches). I think this just requires that the next matching loop start at the correct offset in the combinedWords/masks arrays (the one after the index where the current match was found, assuming you don't want to consider any of the bits in the current match for the next match). Not rocket science, but something that might not be obvious at first glance.
Michael Burr
@reinier - damn- I should go back to sleep.
Michael Burr
`pattern` should be declared `1234U`, but other than that... this is pretty much what I would have done.
caf
A: 

atomice's

looked good until I considered Luke and MSalter's requests for more information about the particulars.

Turns out the particulars might indicate a quicker approach than KMP. The KMP article links to

for a particular case when the search pattern is 'AAAAAA'. For a multiple pattern search, the

might be most suitable.

You can find further introductory discussion here.

Ewan Todd
still... these all relate to text searches and nothing on bitlevel. Turning the bitstream into a bytestream first might be very costly (memory/processortime) and impractical.
Toad
Reinier, I'm concentrating on the string search algorithm here. You point our that the bitwise masking ops are not free. For now, I assume that they are comparably expensive for each algorithm. My main point, though, is that the specifics of the application may allow us to beat KMP.
Ewan Todd
KMP only can test 1 letter at a time. With bits you can test 16 (or with SIMD 64) at a time. This would make KMP or any other letter based algorithm useless.
Toad
I don't understand your insistence that KMP is restricted to byte sized ops. I have in mind a bitwise KMP, which shifts right by one bit on mismatch.
Ewan Todd
Google "Adapting the Knuth–Morris–Pratt algorithm for pattern matching in Huffman encoded texts"
Ewan Todd
+1  A: 

What I would do is create 16 prefixes and 16 suffixes. Then for each 16 bit input chunk determine the longest suffix match. You've got a match if the next chunk has a prefix match of length (16-N)

A suffix match doesn't actually 16 comparisons. However, this takes pre-calculation based upon the pattern word. For example, if the patternword is 101010101010101010, you can first test the last bit of your 16 bit input chunk. If that bit is 0, you only need to test the ...10101010 suffices. If the last bit is 1, you need to test the ...1010101 suffices. You've got 8 of each, for a total of 1+8 comparisons. If the patternword is 1111111111110000, you'd still test the last bit of your input for a suffix match. If that bit is 1, you have to do 12 suffix matches (regex: 1{1,12}) but if it's 0 you have only 4 possible matches (regex 1111 1111 1111 0{1,4}), again for an average of 9 tests. Add the 16-N prefix match, and you see that you only need 10 checks per 16 bit chunk.

MSalters
Nice ideas in here. I'm going to mull over the problem for a little while and perhaps try this and the one from Reiner.
Stuart
+1  A: 

For a general-purpose, non-SIMD algorithm you are unlikely to be able to do much better than something like this:

unsigned int const pattern = pattern to search for
unsigned int accumulator = first three input bytes

do
{
  bool const found = ( ((accumulator   ) & ((1<<16)-1)) == pattern )
                   | ( ((accumulator>>1) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>2) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>3) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>4) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>5) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>6) & ((1<<16)-1)) == pattern );
                   | ( ((accumulator>>7) & ((1<<16)-1)) == pattern );
  if( found ) { /* pattern found */ }
  accumulator >>= 8;

  unsigned int const data = next input byte
  accumulator |= (data<<8);
} while( there is input data left );
moonshadow
your accumulator needs 3 input bytes in the init ;^)
Toad
@reinier: d'oh! Fixed.
moonshadow
+10  A: 

Here is a trick to speed up the search by a factor of 32, if neither the Knuth-Morris-Pratt algorithm on the alphabet of two characters {0, 1} nor reinier's idea are fast enough.

You can first use a table with 256 entries to check for each byte in your bit stream if it is contained in the 16-bit word you are looking for. The table you get with

unsigned char table[256];
for (int i=0; i<256; i++)
  table[i] = 0; // initialize with false
for (i=0; i<8; i++)
  table[(word >> i) & 0xff] = 1; // mark contained bytes with true

You can then find possible positions for matches in the bit stream using

for (i=0; i<length; i++) {
  if (table[bitstream[i]]) {
    // here comes the code which checks if there is really a match
  }
}

As at most 8 of the 256 table entries are not zero, in average you have to take a closer look only at every 32th position. Only for this byte (combined with the bytes one before and one after) you have then to use bit operations or some masking techniques as suggested by reinier to see if there is a match.

The code assumes that you use little endian byte order. The order of the bits in a byte can also be an issue (known to everyone who already implemented a CRC32 checksum).

Whoever
clever way to speed things up!
Toad
+3  A: 

You can use the fast fourier transform for extremely large input (value of n) to find any bit pattern in O(n log n ) time. Compute the cross-correlation of a bit mask with the input. Cross -correlation of a sequence x and a mask y with a size n and n' respectively is defined by

R(m) = sum  _ k = 0 ^ n' x_{k+m} y_k

then occurences of your bit pattern that match the mask exactly where R(m) = Y where Y is the sum of one's in your bit mask.

So if you are trying to match for the bit pattern

[0 0 1 0 1 0]

in

[ 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1]

then you must use the mask

[-1 -1  1 -1  1 -1]

the -1's in the mask guarantee that those places must be 0.

You can implement cross-correlation, using the FFT in O(n log n ) time.

I think KMP has O(n + k) runtime, so it beats this out.

ldog
what would you use fourier if this can be solved in O(n) time?
ralu
A: 

A fast way to find the matches in big bitstrings would be to calculate a lookup table that shows the bit offsets where a given input byte matches the pattern. Then combining three consecutive offset matches together you can get a bit vector that shows which offsets match the whole pattern. For example if byte x matches first 3 bits of the pattern, byte x+1 matches bits 3..11 and byte x+2 matches bits 11..16, then there is a match at byte x + 5 bits.

Here's some example code that does this, accumulating the results for two bytes at a time:

void find_matches(unsigned char* sequence, int n_sequence, unsigned short pattern) {
    if (n_sequence < 2)
        return; // 0 and 1 byte bitstring can't match a short

    // Calculate a lookup table that shows for each byte at what bit offsets
    // the pattern could match.
    unsigned int match_offsets[256];
    for (unsigned int in_byte = 0; in_byte < 256; in_byte++) {
        match_offsets[in_byte] = 0xFF;
        for (int bit = 0; bit < 24; bit++) {
            match_offsets[in_byte] <<= 1;
            unsigned int mask = (0xFF0000 >> bit) & 0xFFFF;
            unsigned int match_location = (in_byte << 16) >> bit;
            match_offsets[in_byte] |= !((match_location ^ pattern) & mask);
        }
    }

    // Go through the input 2 bytes at a time, looking up where they match and
    // anding together the matches offsetted by one byte. Each bit offset then
    // shows if the input sequence is consistent with the pattern matching at
    // that position. This is anded together with the large offsets of the next
    // result to get a single match over 3 bytes.
    unsigned int curr, next;
    curr = 0;
    for (int pos = 0; pos < n_sequence-1; pos+=2) {
        next = ((match_offsets[sequence[pos]] << 8) | 0xFF) & match_offsets[sequence[pos+1]];
        unsigned short match = curr & (next >> 16);
        if (match)
            output_match(pos, match);
        curr = next;
    }
    // Handle the possible odd byte at the end
    if (n_sequence & 1) {
        next = (match_offsets[sequence[n_sequence-1]] << 8) | 0xFF;
        unsigned short match = curr & (next >> 16);
        if (match)
            output_match(n_sequence-1, match);
    }
}

void output_match(int pos, unsigned short match) {
    for (int bit = 15; bit >= 0; bit--) {
        if (match & 1) {
            printf("Bitstring match at byte %d bit %d\n", (pos-2) + bit/8, bit % 8);
        }
        match >>= 1;
    }
}

The main loop of this is 18 instructions long and processes 2 bytes per iteration. If the setup cost isn't an issue, this should be about as fast as it gets.

Ants Aasma
Trying do digest your code (not sure if I get it completely) I see one potential problem: if a byte is located at more than 1 place in the pattern then your lookuptable won't work since it can only store one place in the pattern. So given the byte: 00000000 and the pattern: 1000000000000011 it should give 5 locations, but it can only give 1, resulting in a possible miss of the pattern. Or am I missing something?
Toad
I'll represent bits in least to most significant order. match_offsets[0x00] = 4286594816. That is 00000000 11111100 00000001 11111111, the last 9 bits are not significant, the 6 set bits in the middle represent that the byte can match at pattern >> 1, pattern >> 2 .. pattern >> 6. (for reference least significant bit means that byte matches at pattern << 7 masking out 7 low bits, 8'th bit or bit7 means match at pattern << 0 or that the given byte equals the low byte of the pattern)
Ants Aasma
+2  A: 
Vadakkumpadath
great that solutions still keep coming in. By the way, your name sounds awesome. Could be the name of a character in a star wars movie ;^)
Toad