views:

1423

answers:

4

I'm writing some code that plays back WAV files at different speeds, so that the wave is either slower and lower-pitched, or faster and higher-pitched. I'm currently using simple linear interpolation, like so:

            int newlength = (int)Math.Round(rawdata.Length * lengthMultiplier);
            float[] output = new float[newlength];

            for (int i = 0; i < newlength; i++)
            {
                float realPos = i / lengthMultiplier;
                int iLow = (int)realPos;
                int iHigh = iLow + 1;
                float remainder = realPos - (float)iLow;

                float lowval = 0;
                float highval = 0;
                if ((iLow >= 0) && (iLow < rawdata.Length))
                {
                    lowval = rawdata[iLow];
                }
                if ((iHigh >= 0) && (iHigh < rawdata.Length))
                {
                    highval = rawdata[iHigh];
                }

                output[i] = (highval * remainder) + (lowval * (1 - remainder));
            }

This works fine, but it tends to sound OK only when I lower the frequency of the playback (i.e. slow it down). If I raise the pitch on playback, this method tends to produce high-frequency artifacts, presumably because of the loss of sample information.

I know that bicubic and other interpolation methods resample using more than just the two nearest sample values as in my code example, but I can't find any good code samples (C# preferably) that I could plug in to replace my linear interpolation method here.

Does anyone know of any good examples, or can anyone write a simple bicubic interpolation method? I'll bounty this if I have to. :)

Update: here are a couple of C# implementations of interpolation methods (thanks to Donnie DeBoer for the first one and nosredna for the second):

    public static float InterpolateCubic(float x0, float x1, float x2, float x3, float t)
    {
        float a0, a1, a2, a3;
        a0 = x3 - x2 - x0 + x1;
        a1 = x0 - x1 - a0;
        a2 = x2 - x0;
        a3 = x1;
        return (a0 * (t * t * t)) + (a1 * (t * t)) + (a2 * t) + (a3);
    }

    public static float InterpolateHermite4pt3oX(float x0, float x1, float x2, float x3, float t)
    {
        float c0 = x1;
        float c1 = .5F * (x2 - x0);
        float c2 = x0 - (2.5F * x1) + (2 * x2) - (.5F * x3);
        float c3 = (.5F * (x3 - x0)) + (1.5F * (x1 - x2));
        return (((((c3 * t) + c2) * t) + c1) * t) + c0;
    }

In these functions, x1 is the sample value ahead of the point you're trying to estimate and x2 is the sample value after your point. x0 is left of x1, and x3 is right of x2. t goes from 0 to 1 and is the distance between the point you're estimating and the x1 point.

The Hermite method seems to work pretty well, and appears to reduce the noise somewhat. More importantly it seems to sound better when the wave is sped up.

+1  A: 

You're looking for polynomial interpolation. The idea is that you pick a number of known data points around the point you want to interpolate, compute an interpolated polynomial using the data points, and then find out the value of the polynomial and the interpolation point.

There are other methods. If you can stomach the math, look at signal reconstruction, or google for "signal interpolation".

Oren Trutner
I can stomach the math, but mainly I'm looking for code samples (preferably C# or C).
MusiGenesis
+2  A: 
double InterpCubic(double x0, double x1, double x2, double x3, double t)
{
   double a0, a1, a2, a3;

   a0 = x3 - x2 - x0 + x1;
   a1 = x0 - x1 - a0;
   a2 = x2 - x0;
   a3 = x1;

   return a0*(t^3) + a1*(t^2) + a2*t + a3;
}

where x1 and x2 are the samples being interpolated between, x0 is x1's left neighbor, and x3 is x2's right neighbor. t is [0, 1], denoting the interpolation position between x1 and x2.

Donnie DeBoer
Me likey. I'm going to try this out.
MusiGenesis
One note: since cubic interpolation uses 4 samples (the 2 being interpolated between and their 2 closest neighbors), you must figure out how to handle the very first interpolation interval and the very last interpolation interval of the waveform data. Often, people just invent phantom samples to the left and right.
Donnie DeBoer
No problem. My example above uses a phantom sample to the right.
MusiGenesis
Well, it works, but unfortunately it sounds even worse than linear interpolation. Linear interpolation seems to produce more of a general low-level hiss, whereas this cubic formula tends to produce a high-pitched ringing.
MusiGenesis
If you look at your processed sample with a spectrum analyzer, you'll see the hiss as a noise floor. The ringing is probably the result of a few added frequencies from the cubic that modulate in amplitude.
Nosredna
+5  A: 

My favorite resource for audio interpolating (especially in resampling applications) is Olli Niemitalo's "Elephant" paper.

I've used a couple of these and they sound terrific (much better than a straight cubic solution, which is relatively noisy). There are spline forms, Hermite forms, Watte, parabolic, etc. And they are discussed from an audio point-of-view. This is not just your typical naive polynomial fitting.

And code is included!

To decide which to use, you probably want to start with the table on page 56 which groups the algorithms into operator complexity (how many multiplies, and how many adds). Then choose among the best signal-to-noise solutions--use your ears as a guide to make the final choice. Note: Generally, the higher SNR, the better.

Nosredna
Great link, again. I'm digging into it now. I just implemented the cubic interpolation from another answer, and it sounded terrible.
MusiGenesis
If you use any of these with a SNR over, say, 60, you should not be able to hear artifacts. If you do, you've probably made a mistake implementing. And don't underestimate the ease with which you can screw up which point is which and what your "between" value is. I screwed up my first try. You may have even messed up the cubic. It helps to graph sections of your input and output. :-)
Nosredna
Huh, like I evver make mistaks!
MusiGenesis
I think I implemented the cubic right - it was only a slight modification of my original code.
MusiGenesis
A bad result from the cubic would not surprise me. But it's incredibly easy to be off by one or get your "between value" at 1-x when it should be x, etc.
Nosredna
I might have missed this in the text, but what's the difference between an x-form implementation and a z-form implementation?
MusiGenesis
That's discussed on 35 and 36. If I recall correctly, they are equivalent in result, but one can be cheaper to calculate than the other depending on the specific interpolator. To you it should not make a difference. He does both for each equation and presents the cheaper solution.
Nosredna
Also, pardon my density here, but the linked article recommends these interpolators for oversampled data, but I'm working on pre-recorded WAV files which can't be oversampled (unless I'm misunderstanding the term?). The author says oversampling is left to the reader, so maybe I need to ask another question here.
MusiGenesis
It's been 3 or 4 years since I used one of these, though. So I'm highly likely to be wrong on any given point. The only thing I'm sure of is that the interpolators in the paper are better than anything else I ever found. I had a moment of panic when I searched for the paper, as it's not where i used to be. I got a 404 and almost feinted. Then I searched for the pdf's filename and finally found it. Scary.
Nosredna
The author mentions high-N oversampling with simple linear interpolation as a high-quality approach to this problem. This may be a better approach for my purposes, since I could do the oversampling once on the original audio, and then produce different playback speeds on the fly quickly with linear interpolation.
MusiGenesis
I've saved a copy on my laptop - that one's a keeper.
MusiGenesis
It's ideal to have oversampled data. If you don't, there's nothing you can do about it. I've used these in real-time ring modulators, oscillators, etc.
Nosredna
Is it possible to produce oversampled data from a pre-recorded WAV file? I'll ask this again as a real question - I just don't want to look especially stupid.
MusiGenesis
@MusiGnesis, your situation is different from mine. Whatever you end up doing, if it gets good results I want to know what you did. What I like about these equations is you won't get noise or weird overtones.
Nosredna
No, you can't get oversampled data from a prerecorded WAV file. Nyquist, essentially, says you don't have any real info for any frequencies above 1/2 the sampling rate.
Nosredna
Since you have existing data (and it's the same problem as real-time generated data, except that we can choose to be running our oscillators at 2x, 4x, 8x, or whatever the processor will handle), all you can do is draw a curve through your points that adds the least amount of noise possible. You can't get any extra data from your sample--that data no longer exists--it may as well be in a black hole.
Nosredna
I'm going to implement the 4-point Hermite, which he seems to say is the best for un-oversampled data. The funny thing is, I've never really found linear interpolation to be all that awful, but I'm using WAV file data for pitched synthesizer sounds and I can usually get away with only lowering the frequency. I'd like to be able to raise the pitch without sounding like a 1993 soundcard FM synth.
MusiGenesis
Is it possible that what's bothering you about raising the pitch is the Alvin and the Chipmunks effect? Do you know anything about formant filters? (But the linear interpolation is absolutely adding noise 10dB under your signal.)
Nosredna
I'm doing a version of wavetable synthesis, but my code forms the amplitude envelopes for all the notes, so the overall shape is the same regardless of pitch. The particular problem I'm having is with the sustain part of a note sounding kind of "scratchy" (if that makes sense) when I create a note at a higher frequency. The Hermite interpolation seems to make this part sound much better and clearer, but it might just be that my audio neurons are dying at a faster rate today.
MusiGenesis
How are you doing sustain? Looping the same section over and over? Where does that section come from? You make be lifting some low frequency noise into audible range.
Nosredna
It's a loop approach. I don't think the problem is low frequency noise lifted into the audible. The "scratchiness" is very high frequency - I can get rid of it with a lowpass filter, but I don't like the overall dulling effect and I'm trying to avoid having to do an FFT.
MusiGenesis
Any frequencies near Nyquist could do that to you when you speed up. You have to do the lowpass before you speed up, or else you'll be cutting out sounds that belong there, not just the ones that folded over (that's why you're getting a lack of brightness, probably). Suppose you have a sample you want to play at 44100. You have a sample you want to play at double speed. Put the lowpass cutoff at 11025, do the filter, then speed up. You don't want any frequencies going over Nyquist.
Nosredna
A: 

Honestly, cubic interpolation isn't generally much better for audio than linear. A simple suggestion for improving your linear interpolation would be to use an antialiasing filter (before or after the interpolation, depending on whether you are shortening the signal or lengthening it). Another option (though more computationally expensive) is sinc-interpolation, which can be done with very high quality.

We have released some simple, LGPL resampling code that can do both of these as part of WDL (see resample.h).

Justin Frankel