views:

84

answers:

2

I have a stream of 16 bit values, and I need to adjust the 4 least significant bits of each sample. The new values are different for each short, but repeat every X shorts - essentially tagging each short with an ID.

Are there any bit twiddling tricks to do this faster than just a for-loop?

More details I'm converting a file from one format to another. Currently implemented with FILE* but I could use Windows specific APIs if helpful.

[while data remaining]
{
   read X shorts from input
   tag 4 LSB's
   write modified data to output
}

In addition to bulk operations, I guess I was looking for opinions on the best way to stomp those last 4 bits.

  1. Shift right 4, shift left 4, | in the new values
  2. & in my zero bits, then | in the 1 bits
  3. modulus 16, add new value

We're only supporting win7 (32 or 64) right now, so hardware would be whatever people choose for that.

+3  A: 

If you're working on e.g. a 32-bit platform, you can do them 2 at a time. Or on a modern x86 equivalent, you could use SIMD instructions to operate on 128 bits at a time.

Other than that, there are no bit-twiddling methods to avoid looping through your entire data set, given that it sounds like you must modify every element!

Oli Charlesworth
Just try to avoid branches (and especially those that depends on the read data) in the loop.
ruslik
Read as much of the data file as you can into memory as well, or use something that buffers it for you (which ifstream supposedly does).
JH
Would working on int* really be faster than working on a short* for the same block of data?
Tom
@Tom: If the native data width can handle processing two (or more) items at a time, then potentially yes! It's potentially twice as fast, dependent on whether you're limited by CPU or by memory (or I/O). Of course, this is only going to work if you go with the bit-mask approach.
Oli Charlesworth
A: 

Best way to stomp those last 4 bits is your option 2:

int i;
i &= 0xFFF0;
i |= tag;

Doing this on a long would be faster if you know tag values in advance. You can memcpy 4 shorts in one long and then do the same operations as above on 4 shorts at a time:

long l;
l &= 0xFFF0FFF0FFF0FFF0;
l |= tags;

where tags = (long) tag1 << 48 + (long) tag2 << 32 + (long) tag3 << 16 + (long) tag4; This has sense if you are reusing this value tags often, not if you have to build it differently for each set of 4 shorts.

Benoit Thiery
This is what I was starting to feel most comfortable with, thanks for the long suggestion. And my tags do repeat, so pre-building will work well. Marking as answer until/unless someone comes up with a rebuttal. Thanks!
Tom