views:

449

answers:

7

Does anyone know of an optimized way of detecting a 37 bit sequence in a chunk of binary data that is optimal. Sure I can do a brute force compare using windowing (just compare starting with index 0+next 36 bits, increment and loop until i find it) but is there a better way? Maybe some hashing search that returns a probability that the sequence lies within a binary chunk? Or am I just pulling that out of my butt? Anyway, I'm going ahead with the brute force search, but I was curious if there was something more optimal. This is in C by the way.

+1  A: 

If you are analyzing the first N bits for a pattern, then it shouldn't be difficult to determine from which bit to continue pattern search based on the first M bits which certainly cannot be part of the pattern (if the pattern is such that it can be determined).

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
<--            N bits           -->
<--   'ugly' M bits    -->|<-- continue here

That should cut it somewhat shorter.

Of course, one of the most efficient methods is to parse the input with a state machine like a DFA, but it seems like an overkill. Depends on your usage scenario.

kek444
+6  A: 

You could treat the bits as if they were characters from a {0,1} alphabet and run any of several relatively efficient known substring-search algorithms on the data.

Tyler McHenry
Cool, I'll look through those tonight!
dude
+4  A: 
James Youngman
If you were really speed obsessed and wanted to brute force it like this you need to evaluate the 8 (word == w_A) terms outside the if statements and then do a big if (matcha | matchb | matchc | ...) which contains sub-if statements. This way all the logic can be done in parallel rather than forcing the processor do do all these branches in order (which is what else-if statements will do).
Tom Leys
A: 

If the pattern you are looking for is fix, you can build a series of arrays which are shifts in the masks to do the comparisons. To do the comparision, use a xor function and if 0 is returned, it matches. Any other value is not a match. This would allow for checking the bytes in a string, as long as there is at least 2 byte left in the array. With 2 bytes left, you would not be able to increment full eight bits. An example for 17 bits is below, but is the same idea. (I am looking for all ones as it was easy to work with in shifting the bits for demonstration)

/* Data is passed in, and offset is the number of bits offset from the first
   bit where the mask is located
   returns true if match was found.
*/
bool checkData(char* data, int* offset)
{
    /* Mask to mask off the first bits  not being used or examined*/
    static char firstMask[8] = { 0xFF, 0x7F, 0x3F, 0x1F, 0x0F, 0x07, 0x03, 0x01 };

    /* Mask to mask off the end bits not used  or examined*/
    static char endMask[8] = { 0x80, 0xC0, 0xE0, 0x0F, 0xF8, 0xFC, 0xFE, 0xFF };

    /* Pattern which is being search, with each row being the about shifted and 
       columns contain the pattern to be compared.  for example index 0 is a 
       shift of 0 bits in the pattern and 7 is a shift of seven bits
       NOTE: Bits not being used are set to zero.  
    */
    static char pattern[8][3] = { { 0xFF, 0xFF, 0x80 },  /* Original pattern */
                                  { 0x8F, 0xFF, 0xC0 },  /* Shifted by one */
                                  { 0x3F, 0xFF, 0xE0 },  /* Shifted by two */
                                  { 0x1F, 0xFF, 0xF0 },
                                  { 0x0F, 0xFF, 0xF8 },
                                  { 0x07, 0xFF, 0xFC },
                                  { 0x03, 0xFF, 0xFE },
                                  { 0x01, 0xFF, 0xFF }}; /* shifted by seven */

    /* outer loop control variable */
    int lcv;

    /* inter loop control variable */
    int lcv2;

    /* value to to contain the value results */
    char value;

    /* if there is no match, pass back a negative number to indicate no match */
    *offset = -1;

    /* Loop through the shifted patterns looking for a match */
    for ( lcv = 0; lcv < 8 ; lcv++ ) 
    {
        /* check the first part of the pattern.  
           mask of part that is not to be check and xor it with the 
           first part of the pattern */

        value = (firstMask[lcv] & *data) ^ pattern[lcv][0];
        /* if value is not zero, no match, so goto the next */
        if ( 0 != value ) 
        {
            continue;
        }

        /* loop through the middle of the pattern make sure it matches
           if it does not, break the loop
           NOTE:  Adjust the condition to match 1 less then the number 
                  of 8 bit items  you are comparing
        */
        for ( lcv2 = 1; lcv2 < 2; lcv2++)
        {
            if ( 0 != (*(data+lcv2)^pattern[lcv][lcv2]))
            {
                break;
            }
        }

        /* if the end of the loop was not reached, pattern 
           does not match, to continue to the next one
           NOTE: See note above about the condition 
        */   
        if ( 2 != lcv2)
        {
          continue;
        }

        /* Check the end of the pattern to see if there is a match after masking
           off the bits which are not being checked.
        */  
        value = (*(data + lcv2) & endMask[lcv]) ^ pattern[lcv][lcv2];

        /* if value is not zero, no match so continue */
        if ( 0 != value ) 
        {
          continue;
        }
    }
    /* If the end of the loop was not reached, set the offset as it 
       is the number of bits the pattern is offset in the byte and 
       return true
    */
    if ( lcv < 8 ) 
    {
        *offset = lcv ;
        return true;
    }
    /* No match was found */
    return false;
}

This requires the user to supply a pointer to the data and to call it for the next byte. The user needs to make sure they are not going to run over the end of the data in the pattern matching.

It there is not match early in the pattern, it will not go on checking the rest of the bits, which should help the search time.

This implementation should be fairly portable, but it will require some rework for 37 bits.

Glenn
A: 

Given any byte B, you want to ask what positions, if any, it could occupy in the 37-bit sequence. Then

  1. You keep a set of possible positions for the current byte, which starts empty.
  2. If you see a byte with position 0, you add 0 to the set.
  3. If you see a byte with position 1..7, you mask and check the preceding byte, and if it's OK, add the current position to the set.
  4. To move to a new byte, check each position in the set, add 8 to it, and ask if the new byte could appear in that position. When you get to a position of at least 29, you have won an all-expenses-paid trip to a successful search.

You can make this fast through table lookup, although the exact data structures to be used are open to experimentation. Since you have 256 bytes and 8 initial positions, you can store the initial positions in an array of 256 bytes, hoping the common case of all 0s is frequent. This should make the cost for steps 2 and 3 either O(1) or O(8), in either case a small constant.

For the later position checks I think you want to index by position rather than by byte, so you'll need an array of 29 bytes (one for each position from 8..36). This check is O(1) times the number of currently active positions.

This seems like fun; let us know how you make out.

Norman Ramsey
I intend on answering my own question now and in the future so people can benefit. Thanks! I think that's only common hospitality to let people know if something worked or not. It is good to conserve bits, so people don't have to ask questions a million times and can just do a search on stack.
dude
A: 

Here's one way: finding your single target bitstring can be reduced to finding any of a certain set of bytestrings. There are fast methods for that problem, like Aho-Corasick, which searches in one pass, never backing up, in time independent of the size of the target set.

(The set of bytestrings equals each of 8 shifts of the bitstring, filled out with all possible padding bits where needed on the first and last bytes. I think that works out to 1024 of them.)

Darius Bacon
A: 

Hi, i know nothing about programming or C or windowing... nothing too deep about computers... but i'm very interested in: given a pseudo-random binary sequence (e.g.: 00101010010101) of finite values, predict how will the sequence continue. Can someone please tell me the easiest way to do it? Or in case it's too difficult for someone who can barely play solitaire on its computer, can someone tell me where to give my first steps...I'm pretty nerdy. PS: can this technique be used to precict the colour of the next electronic roulette number (e.g.: asigning 1 and 0 to red and black respectively)?

Thanks!

Sorry about my english...i know it sucks

Rocket Power