views:

439

answers:

3

Because of memory constrains, I have to store some pairs of values in an array with 6 bits/pair (3 bits/value). The problem comes when I want to access this array as a normal one, based on the index of the pair. The array looks like this

|--byte 0 | --byte 1 | --byte 2  
|00000011 | 11112222 | 22333333   ...  and so on, the pattern repeats.  
|------|-------|--------|------|  
 pair 0  pair 1  pair 2  pair 3 

 => 4 pairs / 3 bytes

You can see that sometimes (for indexes divisible by 1 and 2) 2 bytes are required to extract the values.
I made a function that given an index, returns the first value from the pair (3 bits) and the other one (also 3 bits).

void GetPair(char *array, int index, int &value1, int &value2) {
    int groupIndex = index >> 2; // Divide by 4 to get the index of the group of 3 bytes (with 4 pairs)
    // We use 16 bits starting with the first byte from the group for indexes divisible by 0 and 1,  
    // 16 bits starting with the second byte when divisible by 2 and 3
    short int value = *(short int *)(array + groupIndex + ((index & 0x02) >> 1));

    switch(index & 0x03) { // index % 4
        case 0: { 
            // extract first 3 bits
            value1 = (value & 0xE000) >> 13;
            // extract the next 3 bits
            value2 = (value & 0x1C00) >> 10;
            break;
        }
        case 1: {
            value1 = (value & 0x380) >> 7;
            value2 = (value & 0x70) >> 4;
            break;
        }
        case 2: {
            value1 = (value & 0xE00) >> 9;
            value2 = (value & 0x1C0) >> 6;
            break;
        }
        case 3: {
            value1 = (value & 0x38) >> 2;
            value2 = value & 0x7;
            break;
        }
}

Now my question is: Is there any faster method to extract these values?

I made a test and when using 2 bytes/pair (1 byte/value) it takes about 6 seconds to access all pairs (53 in total) 100 million times. When using the compact array, it takes about 22 seconds :( (probably because it needs to compute all those masks and bit shifts).
I tried to explain as clearly as i could... forgive me if not.

+1  A: 

Modern architectures don't even address single bytes anymore; they address 4-byte words and extract the piece you requested. So on stock hardware, you might see an improvement by using 4 bytes per pair and extracting the pieces yourself. 4 bytes per entry might be faster too, but the cost of loading the second word is probably greater than the cost of masking and shifting. Or maybe not; modern processors are weird. Profile it and see!

David Seiler
It's more accurate to say that, while modern processors still use bytes as the fundamental unit for their address space, their wide data busses and cache lines mean that if you only ask for a single byte, the CPU will probably end up loading an entire word into cache anyways. Determining the optimal data size for something like this is highly dependent on the architecture, which is probably not x86 given the extreme memory constraints.
+2  A: 

This is a classic case of trading off speed for memory efficiency. I assume that you are working in an environment where memory is scarce and you need to shove many, many items into this array, otherwise this probably isn't worth your time.

You can eliminate the switch statement by using a lookup table to find the correct shift and mask values.

short int shift1[4] = { 13, 7, 9, 2 };
short int shift2[4] = { 10, 4, 6, 0 };
short int mask1[4] = { 0xe000, 0x0380, 0x0e00, 0x38 };
short int mask2[4] = { 0x1c00, 0x0700, 0x1c, 0x07 };

int index = value % 4; /* you're not saving any time by using bitwise AND but you are making your code less readable */
value1 = (value & mask1[index]) >> shift1;
value2 = (value & mask2[index]) >> shift2;

The idea is that you eliminate any branching. However each path is so short that it just may not matter. In my testing (gcc on a PowerPC) there was barely any difference. However, the memory bandwidth on this machine is slow enough that both versions are faster than just using direct array accesses and 1 byte per value.

Matt Kane
This saves about 4 seconds in my test! Thanks.
lgratian
+2  A: 

How about this? It eliminates memory accesses for the masks and shift values. (Of course, the (non-portable) assumption is that char is 8 bit and short is 16 bit. It is also assumed that index * 6 does not overflow int.)

void GetPair(char *array, int index, int &value1, int &value2)
{
   unsigned shift = 10 - index * 6 % 8;
   unsigned short data = (*(unsigned short *)(array + index * 6 / 8) >> shift) & 0x3f;
   value2 = data & 7;
   value1 = data >> 3;
}

There might be a penalty for reading a short crossing a 16 bit boundary, though. There used to be such issues way back when I was still keeping track of such things. If that is the case, it would probably be better to read a 32 bit value starting at a 16 bit boundary and adjust the shifts and masks accordingly.

Ari
Mine: 25550 msMatt Kane: 23926 msAri: 23114 ms
lgratian
Yeah, that's what I was really going for, but my brain wasn't quite working :)
Matt Kane
I would suggest testing the 32 bit (or even 64 bit - depending on your hardware) approach as well. My experience is that these things often give counterintuitive results owing to the complexities of the caching and pipelining systems.
Ari