views:

57

answers:

4

I need to repeatedly convert 1024+ consecutive 4 byte floats (range -1 to 1) to 2 byte shorts (range -32768 to 32767) and write to disk.

Currently I do this with a loop:

short v = 0;
for (unsigned int sample = 0; sample < length; sample++) 
{
    v = (short)(inbuffer[sample * 2] * 32767.0f);
    fwrite(&v, 2, 1, file);
}

And this works, but the floating point calc and loop is expensive. Is there any way this could be optimized?

+2  A: 

I would have thought the repeated calls to fwrite would be the expensive part. How about:

short outbuffer[length]; // note: you'll have to malloc this if length isn't constant and you're not using a version of C that supports dynamic arrays.
for (unsigned int sample = 0; sample < length; sample++) 
{
    outbuffer[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}
fwrite(outbuffer, sizeof *outbuffer, length, file);
David
Even if he has dynamic arrays it would not be a good idea to use them in a context where you don't know a bound on the size. Beware stack overflow.
Jens Gustedt
+4  A: 
short v = 0;
for (unsigned int sample = 0; sample < length; sample++) 
{
    v = (short)(inbuffer[sample * 2] * 32767.0f);
    // The problem is not here-------^^^^^^^^^^^
    fwrite(&v, 2, 1, file);        
    // it is here ^^^^^^^
}

A typical Mac (objective-c tag, or are we talking about iphone here?) can do billions of float multiplications per second. fwrite however is a library call, which follows some indirections to write its data to some buffer and possibly flush it. It is better to fill your own buffer in a batch:

short v[SZ] = 0;
// make sure SZ is always > length, or allocate a working buffer on the heap.
for (unsigned int sample = 0; sample < length; sample++) 
{
    v[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}
fwrite(v,sizeof(v),1,file);
Luther Blissett
+2  A: 

I suppose, that the bottleneck of your loop may be not short to float conversion but writing output to file - try to move file output outside the loop

short v = 0;
short outbuffer = // create outbuffer of required size
for (unsigned int sample = 0; sample < length; sample++) 
{
    outbuffer[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}

fwrite(outbuffer, 2, sizeof(outbuffer), file);
Vladimir
A: 

You could try something like this:

out[i] = table[((uint32_t *)in)[i]>>16];

where table is a lookup table that maps the upper 16 bits of an IEEE float to the int16_t value you want. However that will lose some precision. You'd need to keep and use 23 bits (1 sign bit, 8 exponent bits, and 14 mantissa bits) for full precision, and that means a 16 MB table, which will kill cache coherency and thus performance.

Are you sure that the floating point conversions are slow? As long as you're using fwrite that way, you're spending a good 50-100 times as much cpu time in fwrite as on floating point arithmetic. If you deal with this issue and the code is still too slow, you could use an approach of adding a magic bias and reading off the mantissa bits to convert to int16_t instead of multiplying by 32767.0. That might or might not be faster.

R..