tags:

views:

243

answers:

4

I want to write an app to transpose the key a wav file plays in (for fun, I know there are apps that already do this)... my main understanding of how this might be accomplished is to

1) chop the audio file into very small blocks (say 1/10 a second)

2) run an FFT on each block

3) phase shift the frequency space up or down depending on what key I want

4) use an inverse FFT to return each block to the time domain

5) glue all the blocks together

But now I'm wondering if the transformed blocks would no longer be continuous when I try to glue them back together. Are there ideas how I should do this to guarantee continuity, or am I just worrying about nothing?

A: 

You may have to find a zero-crossing between the blocks to glue the individual wavs back together. Otherwise you may find that you are getting clicks or pops between the blocks.

Robert Harvey
yeah that was what I was concerned with, but just making it continuous at the boundary is probably not good enough, I suspect a discontinuous gradient or even second derivative might give me the clicks too.
tbischel
+2  A: 

Overlap the time samples for each block by half so that each block after the first consists of the last N/2 samples from the previous block and N/2 new samples. Be sure to apply some window to the samples before the transform.

After shifting the frequency, perform an inverse FFT and use the middle N/2 samples from each block. You'll need to adjust the final gain after the IFFT.

Of course, mixing the time samples with a sine wave and then low pass filtering will provide the same shift in the time domain as well. The frequency of the mixer would be the desired frequency difference.

Larry
I think the mixer is probably more what the OP is looking for. But if there's a filter involved, the overlap-save FFT trick is really nice. See http://en.wikipedia.org/wiki/Overlap-save_method for more details. If you do it right, you don't need a window either - the window is more for analysis applications.
mtrw
@larry I don't see how this resolves the discontinuity... it seems like the resulting signals would generally be out of phase with the past block. As for frequency mixing a sine wave, I'm not familiar with that approach.
tbischel
@mtrw thanks for the link, I'll look over it
tbischel
A: 

Found this great article on the subject, for anyone trying it in the future!

tbischel
+1  A: 

For speech you might want to look at PSOLA - this is a popular algorithm for pitch-shifting and/or time stretching/compression which is a little more sophisticated than the basic overlap-add method, but not much more complex.

If you need to process non-speech samples, e.g. music, then there are several possibilities, however the overlap-add FFT/modify/IFFT approach mentioned in other answers is probably the best bet.

Paul R
I've had the impression that PSOLA is primarily for speech and not music. Is this correct?
tom10
@tom10: good point - I don't know how well it would work for, e.g. music. I guess a more basic overlap-add approach might be more appropriate if this is for an application other than speech. I'll edit my answer accordingly.
Paul R