views:

304

answers:

3

I'm currently working on the iPhone with Audio Units and I'm playing four tracks simultaneously. To improve the performance of my setup, I thought it would be a good idea to minimize the number of Audio Units / threads, by mixing down the four tracks into one.

With the following code I'm processing the next buffer by adding up the samples of the four tracks, keep them in the SInt16 range and add them to a temporary buffer, which will later on be copied into the ioData.mBuffers of the Audio Unit.

Although it works, I don't have the impression that this is the most efficient way to do this.

SInt16* buffer = bufferToWriteTo; 
int reads      = bufferSize/sizeof(SInt16);      
SInt16** files = circularBuffer->files;

float tempValue;
SInt16 values[reads];
int k,j;       
int numFiles=4;

for (k=0; k<reads; k++)
{
    tempValue=0.f; 
    for (j=0; j<numFiles; j++) 
    {
        tempValue += files[j][packetNumber];     
    }
    if      (tempValue >  32767.f) tempValue =  32767.f;
    else if (tempValue < -32768.f) tempValue =- 32768.f;

    values[k]  = (SInt16) tempValue;
    values[k] += values[k] << 16;
    packetNumber++;
    if (packetNumber >= totalPackets) packetNumber=0;
}
memcpy(buffer,values,bufferSize);

Any ideas or pointers to speed this up? Am I right?

+1  A: 

A couple of pointers even though I'm not really familliar with iPhone development.

You could unwind the inner loop. You don't need a for loop to add 4 numbers together although it might be your compiler will do this for you.

Write directly to the buffer in your for loop. memcpy at the end will do another loop to copy the buffers.

Don't use a float for tempvalue. Depending on the hardware integer math is quicker and you don't need floats for summing channels.

Remove the if/endif. Digital clipping will sound horrible anyway so try to avoid it before summing the channels together. Branching inside a loop like this should be avoided if possible.

Mendelt
Thanks for the tips! Writing directy to the buffer was a good one. Nevertheless, I believe floating point math is faster on the iPhone.
Kriem
The math may be fast but the conversions from int to float and back again might be slow. But Rom already told you that.Just try it out and measure :-) Profiling is always better than talking about performance.
Mendelt
I see. Will try that. Thanks.
Kriem
+1  A: 

The biggest improvement you can get from this code would be by not using floating point arithmetic. While the arithmetic by itself is fast, the conversions which happen in the nested loops, take a long time, especially on the ARM processor in the iPhone. You can achieve exactly the same results by using 'SInt32' instead of 'float' for the 'tempValue' variable.

Also, see if you can get rid of the memcpy() in the last string: perhaps you can construct the 'buffer' directly, without using a temporary buffer called 'values'. That saves one copy, which would be significant improvement for such a function.

Other notes: the last two lines of the loop probably belong outside of the loop and the body of the nested loop should use 'k' as a second index, instead of 'packetNumber', but I'm not sure about this logic.

And the last note: you're squashing the peaks of your resulting sound. While this seems like a good idea, it will sound pretty rough. You probably want to scale the result down instead of cropping it. Like that: instead of this code

for (j=0; j<numFiles; j++) 
{
    tempValue += files[j][packetNumber];            
}
if      (tempValue >  32767.f) tempValue =  32767.f;
else if (tempValue < -32768.f) tempValue =- 32768.f;

you probably want something like this:

for (j=0; j<numFiles; j++) 
{
    tempValue += files[j][packetNumber] / numFiles;            
}

Edit: and please do not forget to measure the performance before and after, to see which one of the improvements gave the biggest impact. This is the best way to learn performance: trial and measurement

Rom
Thanks! These tips helped. :)
Kriem
memcopy, definitely. :)
Kriem
A: 

One thing I found when writing the audio mixing routines for my app is that incremented pointers worked much faster than indexing. Some compilers may sort this out for you but - not sure on the iphone - but certainly this gave my app a big boost for these tight loops (about 30% if I recall).

eg: instead of this:

for (k=0; k<reads; k++)
{
    // Use buffer[k]
}

do this:

SInt16* p=buffer;
SInt16* pEnd=buffer+reads;
while (p!=pEnd)
{
    // Use *p
    p++;
}

Also, I believe iPhone has some sort of SIMD (single instruction multiple data) support called VFP. This would let you perform math on a number of samples in one instruction but I know little about this on iPhone.

cantabilesoftware