views:

824

answers:

4

This is my "weekend" hobby problem.

I have some well-loved single-cycle waveforms from the ROMs of a classic synthesizer.

These are 8-bit samples (256 possible values).

Because they are only 8 bits, the noise floor is pretty high. This is due to quantization error. Quantization error is pretty weird. It messes up all frequencies a bit.

I'd like to take these cycles and make "clean" 16-bit versions of them. (Yes, I know people love the dirty versions, so I'll let the user interpolate between dirty and clean to whatever degree they like.)

It sounds impossible, right, because I've lost the low 8 bits forever, right? But this has been in the back of my head for a while, and I'm pretty sure I can do it.

Remember that these are single-cycle waveforms that just get repeated over and over for playback, so this is a special case. (Of course, the synth does all kinds of things to make the sound interesting, including envelopes, modulations, filters cross-fading, etc.)

For each individual byte sample, what I really know is that it's one of 256 values in the 16-bit version. (Imagine the reverse process, where the 16-bit value is truncated or rounded to 8 bits.)

My evaluation function is trying to get the minimum noise floor. I should be able to judge that with one or more FFTs.

Exhaustive testing would probably take forever, so I could take a lower-resolution first pass. Or do I just randomly push randomly chosen values around (within the known values that would keep the same 8-bit version) and do the evaluation and keep the cleaner version? Or is there something faster I can do? Am I in danger of falling into local minimums when there might be some better minimums elsewhere in the search space? I've had that happen in other similar situations.

Are there any initial guesses I can make, maybe by looking at neighboring values?


Edit: Several people have pointed out that the problem is easier if I remove the requirement that the new waveform would sample to the original. That's true. In fact, if I'm just looking for cleaner sounds, the solution is trivial.

+2  A: 

You could put your existing 8-bit sample into the high-order byte of your new 16-bit sample, and then use the low order byte to linear interpolate some new 16 bit datapoints between each original 8-bit sample.

This would essentially connect a 16 bit straight line between each of your original 8-bit samples, using several new samples. It would sound much quieter than what you have now, which is a sudden, 8-bit jump between the two original samples.

You could also try apply some low-pass filtering.

Robert Harvey
That was the first thing I did. ;-) First I tried linear, then I switched to a better method. It does a little cleaning just because when you hit in-between the samples it works better. It's a tiny bit better than raw, but doesn't really address the fact that the points, thanks to the quantization error, are in the wrong place.
Nosredna
Maybe I'm getting your suggestion wrong but wouldn't that rather double the sampling rate instead of improving the S/N ratio?
0xA3
Likewise, low-pass does remove high noise, but doesn't help noise under the top of the signal. An, of course, it affect phase. Basically, all these old synths had low-pass anyway. I'm really going after the quantization error hardcore here. :-) But thanks for the suggestions.
Nosredna
I think this would help only to upsample the waveform, not to keep it at the same sample rate and just increase the sample size.
cube
divo, I'm open to adding sample slots and compensating with a higher movement through the data, regardless. So that doesn't really matter.
Nosredna
My point was rather that it I don't see how this would reduce noise. You rather would need to correct the original sample values e.g. by applying a FFT and do some filtering in the frequency domain. Btw, you would have interpolation be done by a reconstruction filter in the DAC anyway.
0xA3
If you can divine what the waveforms are, you could try reproducing them yourself. Things like square waves, saw waves, etc.
Robert Harvey
@Robert Harvey. Well, a few are obvious of course. You can see what they are trying to be from the partials.
Nosredna
It almost sounds like a curve-fitting problem.
Robert Harvey
I think it's an optimization problem.
Nosredna
+1  A: 

Going with the approach in your question, I would suggest looking into hill-climbing algorithms and the like.

http://en.wikipedia.org/wiki/Hill_climbing has more information on it and the sidebox has links to other algorithms which may be more suitable.

AI is like alchemy - we never reached the final goal, but lots of good stuff came out along the way.

MaHuJa
That's exactly what I intend. My usual strategy is that per pass I 1) Choose how many numbers to vary. 2) Decide on a random range of movement for each. 3) Generate a random in that range. 4) Do the eval and decide whether to start there again. That usually keeps me out of local mins.
Nosredna
+1  A: 

Well, I would expect some FIR filtering (IIR if you really need processing cycles, but FIR can give better results without instability) to clean up the noise. You would have to play with it to get the effect you want but the basic problem is smoothing out the sharp edges in the audio created by sampling it at 8 bit resolutions. I would give a wide birth to the center frequency of the audio and do a low pass filter, and then listen to make sure I didn't make it sound "flat" with the filter I picked.

It's tough though, there is only so much you can do, the lower 8 bits is lost, the best you can do is approximate it.

It's almost impossible to get rid of noise that looks like your signal. If you start tweeking stuff in your frequency band it will take out the signal of interest.

For upsampling, since you're already using an FFT, you can add zeros to the end of the frequency domain signal and do an inverse FFT. This completely preserves the frequecy and phase information of the original signal, although it spreads the same energy over more samples. If you shift it 8bits to be a 16bit samples first, this won't be a too much of a problem. But I usually kick it up by an integer gain factor before doing the transform.

Pete

Edit: The comments are getting a little long so I'll move some to the answer.

The peaks in the FFT output are harmonic spikes caused by the quantitization. I tend to think of them differently than the noise floor. You can dither as someone mentioned and eliminate the amplitude of the harmonic spikes and flatten out the noise floor, but you loose over all signal to noise on the flat part of your noise floor. As far as the FFT is concerned. When you interpolate using that method, it retains the same energy and spreads over more samples, this reduces the amplitude. So before doing the inverse, give your signal more energy by multipling by a gain factor.

Are the signals simple/complex sinusoids, or do they have hard edges? i.e. Triangle, square waves, etc. I'm assuming they have continuity from cycle to cycle, is that valid? If so you can also increase your FFT resolution to more precisely pinpoint frequencies by increasing the number of waveform cycles fed to your FFT. If you can precisely identify the frequencies use, assuming they are somewhat discrete, you may be able to completely recreate the intended signal.

The 16-bit to 8-bit via truncation requirement will produce results that do not match the original source. (Thus making finding an optimal answer more difficult.) Typically you would produce a fixed point waveform by attempting to "get the closest match" that means rounding to the nearest number (trunking is a floor operation). That is most likely how they were originally generated. Adding 0.5 (in this case 0.5 is 128) and then trunking the output would allow you to generate more accurate results. If that's not a worry then ok, but it definitely will have a negative effect on accuracy.

UPDATED: Why? Because the goal of sampling a signal is to be able to as close a possible reproduce the signal. If conversion threshold is set poorly on the sampling all you're error is to one side of signal and not well distributed and centered about zero. On such systems you typically try to maximize the use the availiable dynamic range, particularly if you have low resolution such as an 8-bit ADC.

Band limited versions? If they are filtered at different frequencies, I'd suspect it was to allow you to play the same sound with out distortions when you went too far out from the other variation. Kinda like mipmapping in graphics. I suspect the two are the same signal with different aliasing filters applied, this may be useful in reproducing the original. They should be the same base signal with different convolutions applied.

NoMoreZealots
Luckily, looking at the signal, in most cases it's pretty obvious--the noise doesn't look like he signal. I realize I can't recover actual signal, but I'll be happy with dropping the noise floor out of audible range.I know I _could_ do the inverse FFT. My goal is to be able to able to divide the 16-bit samples by 256 and have the same data I started with. That's why I was worried about the inverse FFT.Can you explain the last thing you said about needing to kick up gain? I didn't catch what you meant there.
Nosredna
My thought is that _any_ 16-bit results I get that can be truncated to the 8-bit original cycles is "valid" in the sense that it was a possible input. But there are clear(-ish) signal peaks surrounded by this floor of noise that was not there in the sound that was recorded. I know that because of interviews with synth makers of the era, who moved to 12 and 16 bit DACs as soon as they could afford to.
Nosredna
There aren't peaks from quantization. At least not too many and not too high. It really is more like a noise floor. I've done bit crusher and quantization effects, so I'm pretty familiar with the noise you get as a result. For the samples I have, the waveforms that have hard edges often have a second bandlimited version to be used in higher registers. I don't know enough history to know how they were created. If I went down the inverse-FFT path, I'd still make sure that I can get from 16-bit to 8-bit by taking the high word. That's a requirement.
Nosredna
Yes the bandlimited versions are a lot like mipmapping. That's a good analogy. You can play higher notes with them without aliasing.
Nosredna
Why do you think the sampler rounded rather than floored? Doesn't the analog to digital just accumulate the voltage or something until the next pulse comes along? Admittedly, that whole analog to digital thing is not my forte, but I have DSP books that explain how some samplers work.
Nosredna
+1  A: 

There might be a simple approach taking advantange of the periodicity of the waveforms. How about if you:

  1. Make a 16-bit waveform where the high bytes are the waveform and the low bytes are zero - call it x[n].

  2. Calculate the discrete Fourier transform of x[n] = X[w].

  3. Make a signal Y[w] = (dBMag(X[w]) > Threshold) ? X[w] : 0, where dBMag(k) = 10*log10(real(k)^2 + imag(k)^2), and Threshold is maybe 40 dB, based on 8 bits being roughly 48 dB dynamic range, and allowing ~1.5 bits of noise.

  4. Inverse transform Y[w] to get y[n], your new 16 bit waveform.

  5. If y[n] doesn't sound nice, dither it with some very low level noise.

Notes:

A. This technique only works in the original waveforms are exactly periodic!

B. Step 5 might be replaced with setting the "0" values to random noise in Y[w] in step 3, you'd have to experiment a bit to see what works better.

This seems easier (to me at least) than an optimization approach. But truncated y[n] will probably not be equal to your original waveforms. I'm not sure how important that constraint is. I feel like this approach will generate waveforms that sound good.

mtrw
The constraint is crucial. I want to be able to have waveforms which would sample down to 8-bits and match. Otherwise the problem is easy. But I'll think about this.
Nosredna
Maybe you could do an optimization approach where the cost function is to minimize |y[n] - x[n]| + |Y[w]|, and the search space is the "0" values of Y[w]. The |y[n] - x[n]| part of the cost function would keep y[n] matching x[n] in the high order bits, and the |Y[w]| part push the noise floor down. I don't know how the optimization would deal with the inverse Fourier transform between the search space and the cost function, I have very little experience with optimization techniques.
mtrw