views:

161

answers:

4

Hi all,

I have an array of unsigned chars. Basically I have an array of bits.

I know that the first 16 bits corresponds to an unsigned integer and I retrieve its value using (u16)(*(buffer+ 1) << 8 | *abcBuffer)

Then comes a data type called u30 which is described as follows: u30 - variable length encoded 30-bit unsigned integer value. The variable encoding for u30 uses one to five bytes, depending on the magnitude of the value encoded. Each byte contributes its low seven bits to the value.If the high (8th) bit of a byte is set then the next byte is also part of the value.

I don't understand this description: it says u30(thirty!) and then it says 1 to 5 bytes? Also I have another data type called s24 - three-byte signed integer value.

How should one read (retrieve their values) such non-typical data types? Any help will be appreciated.

Thanks a lot!

+2  A: 

Assuming I understand correctly (always a questionable matter), the following will read the values. It starts at position zero in this example (i would need to be offset by the actual position in the buffer):

   unsigned int val;
   unsigned char buf[300];
   int i;
   int shift;

   i = 0;

   buf[0] = 0x81;
   buf[1] = 0x3;
   val = 0;
   shift = 0;
   do
      {
      val |= (0x7F & buf[i] ) << shift;
      shift += 7;
      i++;
      } while (( buf[i-1] & 0x80 ) && ( i < 5 ));
   printf( "Val = %u\n", val );
Mark Wilkins
Maybe add an error check to detect the case where the 5th byte has too many bits set, so the result doesn't fit in 30 bits?
David Gelhar
Ambiguous in the OP is whether or not the "machine" is big- or little endian. If it is big endian, you'd just shift val by 7 each time through the loop, rather than shifting the current byte up n*7 times. One also has to wonder what is supposed to happen if byte 5 has any of its top 6 bits set (as David says).
dash-tom-bang
Your point about the endian-ness is correct. I did think about that and (admittedly) just took the easy way out. I simply made a guess based on how I would have written the encoding (write the 7 low bits), shift right, extract next 7 bits, etc.
Mark Wilkins
+1  A: 

The encoding format description is somewhat informal perhaps, but should be enough. The idea will be that you read one byte (call it x), you take the lowest 7 bits x & 0x7F and at the same time check if it's highest bit is set. You'll need to write a small loop that merges the 7 bit sequences in a uint variable until the current byte no longer has its highest bit set.

You will have to figure out if you need to merge the new bits at the high end, or the low end of the number (a = (a << 7) | (x & 0x7F)). For that you need one test sequence of which you know what the correct output is.

jdv
+4  A: 
i=0;    
val = buf[i]&0x7F;
while (buf[i++]&0x80)
{ 
  val |= (buf[i]&0x7F)<<(i*7);
}
AShelly
A: 

To read the variable length 30 bit value, you could do something like such:

const char HIGH_BIT = 0x80;
const char DATA_MASK = 0x7F;
const char LAST_MASK = 0x03; // only need 2 bits of last byte
char tmpValue = 0; // tmp holder for value of byte;
int value = 0; holder for the actual value;
char* ptr = buffer; // assume buffer is at the start of the 30 bit number
for(int i = 0; i < 5; i++)
{
   if(i == 4)
   {
      tmpValue = LAST_MASK & *ptr;
   }
   else
   {
      tmpValue = DATA_MASK & *ptr;
   }

   value |= tmpValue << ( 7 * i);

   if(!(HIGH_BIT & *ptr))
   {
      break;
   }
   if(i != 4)
   {
     ++ptr;
   }
}
buff = ptr; // advance the buffer afterwards.

@Mark: your answer was posted while I was typing this, and would work except for the high byte. the value is only 30 bits, so only the first 2 bits of the high byte are used for the value and you are using the full 8 bits of the value.

diverscuba23
Yes indeed - I just thought of that a bit (heh heh) ago and came back to fix it.
Mark Wilkins