views:

856

answers:

3

I am doing some maintenance work involving DirectSound buffers. I would like to know how to interpret the elements in the buffer, that is, to know what each value in the buffer represents. This data is coming from a microphone.

This wave format is being used:

WAVEFORMATEXTENSIBLE format = {
  { WAVE_FORMAT_EXTENSIBLE, 1, sample_rate, sample_rate * 4, 4, 32, 22 },
  { 32 }, 0, KSDATAFORMAT_SUBTYPE_IEEE_FLOAT
};

My goal is to detect microphone silence. I am currently accomplishing this by simply determining if all values in the buffer fail to exceed some threshold volume value, assuming that the intensity of each buffer element directly corresponds to volume.

This what I am currently trying:

bool is_mic_silent(float * data, unsigned int num_samples, float threshold)
{
  float * max_iter = std::max_element(data, data + num_samples);
  if(!max_iter) {
    return true;
  }

  float max = *max_iter;
  if(max < threshold) {
    return true;
  }

  return false;  // At least one value is sufficiently loud.
}
A: 

From here, floating point PCM values are from [-1, 1].

MSN
+2  A: 

As MSN said the samples are in 32-bit floats. To detect a silence you would normally calculate the RMS value: Take the average of the squared sample values over some time interval (say 20-50 ms) and compare (square root of) this average to a threshold. The noise inherent in the microphone signal may let single samples reach above the threshold while the ambient sound would still be considered silence. The averaging over a short interval will result in a value that corresponds better with our perception.

Han
A: 

In addition to Han's suggestion to average samples, als consider calibrating your threshold value. Under different environments, with different microphones and different audio channels, "silence" can mean a lot of things.

The simple way would be loowing to configure the threshold. Alternatively, allow a "Noise floor measurement" where you acqurie a threshold value.

Note that the samples are linear, but levels in audio processing are usually given in dB. So depending on yoru target audience, you may want to convert readings and inputs to/from dB.

peterchen