views:

292

answers:

4

Here's my attempt. Any tips on a better solution?:

// for loop to convert 32 to 16 bits
uint32_t i;
int32_t * samps32 = (int32_t *)&(inIQbuffer[0]);
int16_t * samps16 = (int16_t *)&(outIQbuffer[0]);
for( i = 0; i < ( num_samples * 2/* because each sample is two int32 s*/ ); i++ ) {
    overflowCount += ( abs(samps32[i]) & 0xFFFF8000 ) ? 1 : 0; 
    samps16[i] = (int16_t)samps32[i];
}

// Only report error every 4096 accumulated overflows
if( ( overflowCount & 0x1FFF ) > 4096 ) {
    printf( "ERROR: Overflow has occured while scaling from 32 "
            "bit to 16 bit samples %d times", 
            overflowCount );
}

Here's the part that actually checks for overflow:

overflowCount += ( abs(samps32[i]) & 0xFFFF8000 ) ? 1 : 0;
+3  A: 

I personally prefer to use the SafeInt class to do my overflow checking. It reduces the need for tedious error checking and turns it into an easy to process, yet difficult to ignore exception.

http://blogs.msdn.com/david_leblanc/archive/2008/09/30/safeint-3-on-codeplex.aspx

JaredPar
+1  A: 

What you already do, is closest to the fastests possible for a single cast. you can however omit some code

overflowCount += ( abs(samps32[i]) & 0xFFFF8000 ) ? 1 : 0;

can be changed into:

if (samps32[i] & 0xFFFF8000) overflowCount++;

or even simpler

if (samps32[i] >> 15) overflowCount++;

both of these will be equally fast, and both will be faster than yours.

If you are actually interrested in the count of overflows, you might consider processing the array of integers with SIMD operations.

Zuu
They are not necessarily faster. The ternary operation by the OP is quite trivial and could be optimized to a conditional move by the compiler, as could be both of your alternatives.Even if there is no cmove available, your code could still perform worse because of the conditional branch...
phresnel
... whereas the ternary operation is easier to be transformed into a lookup table lookup.
phresnel
No need for conditional branching, compiler can use `setz` etc. instructions.
Anton Tykhyy
phresnel the reason they are faster is not only because of the shorthand conditional expression, but because the call to abs() have been removed, and that incrementation is used rather than addition.And using a lookup table would just be a huge overhead.
Zuu
A: 

Bit ops would be my choice, too. the only faster way I can imagine at the moment is to use inline assembly where you load the source operand, make a copy onboard the chip, truncate, and bitwise compare (that was pseudo pseudo code).

Your code has an issue: It violates aliasing rules. You could use something like this instead:

union conv_t {
    int32_t i32;
    int16_t i16;
};

Then you could ensure that IQBuffer is of that type. Finally, you could run:

for( i = 0; i < (num_samples * 2); i++ ) {
    <test goes here>
    samps [i].i16 = static_cast<int16_t>(samps [i].i32);
}

edit: As per your edit (http://stackoverflow.com/revisions/677427/list) you drove nearly my whole post invalid. Thanks for not mentioning your edit in your question.

phresnel
+1  A: 

It seems that you are checking for the overflow of a 16-bit addition. You can avoid branch in the assembler code by just having

overflowCount += (samps32[i] & 0x8000) >> 15;

This generates three ALU operations but no branch in the code. It may or may not be faster than a branching version.

antti.huima