views:

195

answers:

7

Hi, I come across a very tricky problem with bit manipulation.

As far as I know, the smallest variable size to hold a value is one byte of 8 bits. The bit operations available in C/C++ apply to an entire unit of bytes.

Imagine that I have a map to replace a binary pattern 100100 (6 bits) with a signal 10000 (5 bits). If the 1st byte of input data from a file is 10010001 (8 bits) being stored in a char variable, part of it matches the 6 bit pattern and therefore be replaced by the 5 bit signal to give a result of 1000001 (7 bits).

I can use a mask to manipulate the bits within a byte to get a result of the left most bits to 10000 (5 bit) but the right most 3 bits become very tricky to manipulate. I cannot shift the right most 3 bits of the original data to get the correct result 1000001 (7 bit) followed by 1 padding bit in that char variable that should be filled by the 1st bit of next followed byte of input.

I wonder if C/C++ can actually do this sort of replacement of bit patterns of length that do not fit into a Char (1 byte) variable or even Int (4 bytes). Can C/C++ do the trick or we have to go for other assembly languages that deal with single bits manipulations?

I heard that Power Basic may be able to do the bit-by-bit manipulation better than C/C++.

+1  A: 
  • << shiftleft
  • ^ XOR
  • >> shift right
  • ~ one's complement

Using these operations, you could easily isolate the pieces that you are interested in and compare them as integers.

say the byte 001000100 and you want to check if it contains 1000:

char k = (char)68;
char c = (char)8;
int i = 0;
while(i<5){
    if((k<<i)>>(8-3-i) == c){
        //do stuff
        break;
    }
}

This is very sketchy code, just meant to be a demonstration.

piggles
Thanks for your quick feedback. But my intention is to reduce the size of incoming data by replacing certain patterns with shorter signals when reading a stream of input data. It is like reading your 01000100 and replaced with 1000, i.e discard the 1st 0 and the last 3 digits 100 out of the output stream. Once I store them in a unit of bytes, it becomes hard to manipulate.
I once had Java code to do that (huffman coding), but I think it is lost. The trick that I came across was to keep an `int` beside your character that told you how many bits were currently in the char. What you need is a buffer to store all the incoming data and a function that will feed you the data one bit at a time by using the shift left. I'm no C expert, but perhaps you could use a data structure to store the raw data in a buffer...
piggles
Yes, this is quite similar to Huffman short codes to replace 8 bit codes to represent the characters. Good advice. I will see what I can do with it.
+1  A: 

If time and space are not important then you can convert the bits to a string representation and perform replaces on the string, then convert back when needed. Not an elegant solution but one that works.

Will A
Thanks. The main purpose of this game is to process the binary stream of input data and produce an output stream of data with smaller size like for every 8 bits, I discard 2 bits and output only 6 bits.
Are the typical means of compression e.g. LZ not suitable for what you're after?
Will A
This is quite similar but I am thinking about do it in real time with input data stream without scanning the whole file first.
LZ doesn't require the whole file to be scanned - it can be performed in real-time, at least, you should be able to find many implementations of zip/deflate *streams* that will accept bytes output compressed bytes in near real-time.
Will A
Oh, I need to check it out. Thanks
+1  A: 

I wonder if C/C++ can actually do this sort of replacement of bit patterns of length that do not fit into a Char (1 byte) variable or even Int (4 bytes).

What about std::bitset?

erjot
A `bitset`'s size cannot be changed at runtime.
strager
@stranger o, rly? unfortunately size of char or int cannot be changed at runtime either. what a pitty
erjot
@trickyricky, It seems like the OP is asking for a 'bit stream' of sorts; perhaps you saw things differently than I did.
strager
@stranger to me it seems that OP is asking whether there exist data structures which can store more than 8-64 bits and, what is most important, size of this structure can be multiplication of pattern size used in search (no need for taking care of spare bits left after single comparison)
erjot
+1. In context of the question about C++ structure, I think std::bitset is the answer. Looking at its interface it seems to be useful enough: e.g. `<<` and `>>` operators return shifted copy.
Dummy00001
The other way to describe my problem is, for example, to process 1 byte input data stream each time but output less than 8 bit data in real time. Can it be done?
A: 

Use a vector<bool> if you can read your data into the vector mostly at once. It may be more difficult to find-and-replace sequences of bits, though.

strager
+1  A: 

Here's a small bit reader class which may suit your needs. Of course, you may want to create a bit writer for your use case.

#include <iostream>
#include <sstream>
#include <cassert>

class BitReader {
    public:
        typedef unsigned char BitBuffer;

        BitReader(std::istream &input) :
            input(input), bufferedBits(8) {
        }

        BitBuffer peekBits(int numBits) {
            assert(numBits <= 8);
            assert(numBits > 0);

            skipBits(0);    // Make sure we have a non-empty buffer

            return (((input.peek() << 8) | buffer) >> bufferedBits) & ((1 << numBits) - 1);
        }

        void skipBits(int numBits) {
            assert(numBits >= 0);

            numBits += bufferedBits;

            while (numBits > 8) {
                buffer = input.get();
                numBits -= 8;
            }

            bufferedBits = numBits;
        }

        BitBuffer readBits(int numBits) {
            assert(numBits <= 8);
            assert(numBits > 0);

            BitBuffer ret = peekBits(numBits);

            skipBits(numBits);

            return ret;
        }

        bool eof() const {
            return input.eof();
        }

    private:
        std::istream &input;
        BitBuffer buffer;
        int bufferedBits;   // How many bits are buffered into 'buffer' (0 = empty)
};
strager
Thanks. Let me try.
A: 

If I understood your questions correctly, you have an input stream and and output stream and you want to replace the 6bits of the input with 5 in the output - and your output still should be a bit stream?

So, the most important programmer's rule can be applied: Divide et impera! You should split your component in three parts:

  1. Input Stream converter: Convert every pattern in the input stream to a char array (ring) buffer. If I understood you correctly your input "commands" are 8bit long, so there is nothing special about this.

  2. Do the replacement on the ring buffer in a way that you replace every matching 6-bit pattern with the 5bit one, but "pad" the 5 bit with a leading zero, so the total length is still 8bit.

  3. Write an output handler that reads from the ring buffer and let this output handler write only the 7 LSB to the output stream from each input byte. Of course some bit manipulation is necessary again for this. If your ring buffer size can be divided by 8 and 7 (= is a multiple of 56) you will have a clean buffer at the end and can start again with 1.

The most simplest way to implement this is to iterate over this 3 steps as long as input data is available.

If a performance really matters and you are running on a multi-core CPU you even could split the steps and 3 threads, but then you must carefully synchronize the access to the ring buffer.

IanH
Thanks for your comment. I will see what I can do.
A: 

I think the following does what you want.

PATTERN_LEN = 6
PATTERNMASK = 0x3F //6 bits
PATTERN     = 0x24 //b100100
REPLACE_LEN = 5
REPLACEMENT = 0x10  //b10000


void compress(uint8* inbits, uint8* outbits, int len)
{
  uint16 accumulator=0;
  int nbits=0;
  uint8 candidate; 

  while (len--) //for all input bytes
  {
    //for each bit (msb first)
    for (i=7;i<=0;i--)
    {
      //add 1 bit to accumulator
      accumulator<<=1;
      accumulator|=(*inbits&(1<<i));  
      nbits++;
      //check for pattern
      candidate = accumulator&PATTERNMASK;
      if (candidate==PATTERN)
      {
        //remove pattern
        accumulator>>=PATTERN_LEN; 
        //add replacement
        accumulator<<=REPLACE_LEN; 
        accumulator|=REPLACMENT;
        nbits+= (REPLACE_LEN - PATTERN_LEN);
      }
    }
    inbits++;
    //move accumulator to output to prevent overflow
    while (nbits>8)
    {
      //copy the highest 8 bits
      nbits-=8;      
      *outbits++ = (accumulator>>nbits)&0xFF;
      //clear them from accumulator
      accumulator&= ~(0xFF<<nbits);
    }
  }
  //copy remainder of accumulator  to output
  while (nbits>0)
  {
    nbits-=8;
    *outbits++ = (accumulator>>nbits)&0xFF;
    accumulator&= ~(0xFF<<nbits);
  }

}

You could use a switch or a loop in the middle to check the candidate against multiple patterns. There might have to be some special handling after doing a replacment to ensure the replacement pattern is not re-checked for matches.

AShelly