fourier transform to transpose key of a wav file

views:

243

answers:

fourier transform to transpose key of a wav file

I want to write an app to transpose the key a wav file plays in (for fun, I know there are apps that already do this)... my main understanding of how this might be accomplished is to

1) chop the audio file into very small blocks (say 1/10 a second)

2) run an FFT on each block

3) phase shift the frequency space up or down depending on what key I want

4) use an inverse FFT to return each block to the time domain

5) glue all the blocks together

But now I'm wondering if the transformed blocks would no longer be continuous when I try to glue them back together. Are there ideas how I should do this to guarantee continuity, or am I just worrying about nothing?

You may have to find a zero-crossing between the blocks to glue the individual wavs back together. Otherwise you may find that you are getting clicks or pops between the blocks.

Robert Harvey 2010-04-12 19:28:39

yeah that was what I was concerned with, but just making it continuous at the boundary is probably not good enough, I suspect a discontinuous gradient or even second derivative might give me the clicks too.

tbischel 2010-04-12 21:03:22

+2 A:

Overlap the time samples for each block by half so that each block after the first consists of the last N/2 samples from the previous block and N/2 new samples. Be sure to apply some window to the samples before the transform.

After shifting the frequency, perform an inverse FFT and use the middle N/2 samples from each block. You'll need to adjust the final gain after the IFFT.

Of course, mixing the time samples with a sine wave and then low pass filtering will provide the same shift in the time domain as well. The frequency of the mixer would be the desired frequency difference.

Larry 2010-04-12 19:34:32

I think the mixer is probably more what the OP is looking for. But if there's a filter involved, the overlap-save FFT trick is really nice. See http://en.wikipedia.org/wiki/Overlap-save_method for more details. If you do it right, you don't need a window either - the window is more for analysis applications.

mtrw 2010-04-12 21:04:12

@larry I don't see how this resolves the discontinuity... it seems like the resulting signals would generally be out of phase with the past block. As for frequency mixing a sine wave, I'm not familiar with that approach.

tbischel 2010-04-12 21:13:39

@mtrw thanks for the link, I'll look over it

tbischel 2010-04-12 21:16:10

Found this great article on the subject, for anyone trying it in the future!

tbischel 2010-04-13 05:46:11

+1 A:

For speech you might want to look at PSOLA - this is a popular algorithm for pitch-shifting and/or time stretching/compression which is a little more sophisticated than the basic overlap-add method, but not much more complex.

If you need to process non-speech samples, e.g. music, then there are several possibilities, however the overlap-add FFT/modify/IFFT approach mentioned in other answers is probably the best bet.

Paul R 2010-04-13 06:48:36

I've had the impression that PSOLA is primarily for speech and not music. Is this correct?

tom10 2010-04-13 16:10:20

@tom10: good point - I don't know how well it would work for, e.g. music. I guess a more basic overlap-add approach might be more appropriate if this is for an application other than speech. I'll edit my answer accordingly.

Paul R 2010-04-13 16:16:54

ansaurus

tags:

views:

answers:

fourier transform to transpose key of a wav file

related questions