views:

74

answers:

3

I am currently trying to implement basic speech recognition in AS3. I need this to be completely client side, as such I can't access powerful server-side speech recognition tools. The idea I had was to detect syllables in a word, and use that to determine the word spoken. I am aware that this will grealty limit the capacities for recognition, but I only need to recognize a few key words and I can make sure they all have a different number of syllables.

I am currently able to generate a 1D array of voice level for a spoken word, and I can clearly see, if I somehow draw it, that there are distinct peaks for the syllables in most of the cases. However, I am completely stuck as to how I would find out those peaks. I only really need the count, but I suppose that comes with finding them. At first I thought of grabbing a few maximum values and comparing them with the average of values but I had forgot about that peak that is bigger than the others and as such, all my "peaks" were located on one actual peak.

I stumbled onto some Matlab code that looks almost too short to be true, but I can't very that as I am unable to convert it to any language I know. I tried AS3 and C#. So I am wondering if you guys could start me on the right path or had any pseudo-code for peak detection?

+3  A: 

The matlab code is pretty straightforward. I'll try to translate it to something more pseudocodeish.

It should be easy to translate to ActionScript/C#, you should try this and post follow-up questions with your code if you get stuck, this way you'll have the best learning effect.

Param: delta (defines kind of a tolerance and depends on your data, try out different values)
min = Inf (or some very high value)
max = -Inf (or some very low value)
lookformax = 1
for every datapoint d [0..maxdata] in array arr do
  this =  arr[d]
  if this > max
    max = this
    maxpos = d
  endif
  if this < min
    min = this
    minpos = d
  endif

  if lookformax == 1
    if this < max-delta
      there's a maximum at position maxpos
      min = this
      minpos = d
      lookformax = 0
    endif
  else
    if this > min+delta
      there's a minimum at position minpos
      max = this
      maxpos = d
      lookformax = 1
    endif
  endif
schnaader
That worked beautifully! Thanks!
Xeon06
+1  A: 

Finding peaks and valleys of a curve is all about looking at the slope of the line. At such a location the slope is 0. As i am guessing a voice curve is very irregular, it must first be smoothed, until only significant peaks exist.

So as i see it the curve should be taken as a set of points. Groups of points should be averaged to produce a simple smooth curve. Then the difference of each point should be compared, and points not very different from each other found and those areas identified as a peak, valleys or plateau.

Andrey
+1  A: 

If anyone wants the final code in AS3, here it is:

function detectPeaks(values:Array, tolerance:int):void
{


var min:int = int.MIN_VALUE;
var max:int = int.MAX_VALUE;
var lookformax:int = 1;
var maxpos:int = 0;
var minpos:int = 0;

for(var i:int = 0; i < values.length; i++)
{
    var v:int = values[i];
    if (v > max)
    {
        max = v;
        maxpos = i;
    }
    if (v < min)
    {
        min = v;
        minpos = i;
    }

    if (lookformax == 1)
    {
        if (v < max - tolerance)
        {
            canvas.graphics.beginFill(0x00FF00);
            canvas.graphics.drawCircle(maxpos % stage.stageWidth, (1 - (values[maxpos] / 100)) * stage.stageHeight, 5);
            canvas.graphics.endFill();

            min = v;
            minpos = i;
            lookformax = 0;
        }
    }
    else
    {
        if (v > min + tolerance)
        {
            canvas.graphics.beginFill(0xFF0000);
            canvas.graphics.drawCircle(minpos % stage.stageWidth, (1 - (values[minpos] / 100)) * stage.stageHeight, 5);
            canvas.graphics.endFill();

            max = v;
            maxpos = i;
            lookformax = 1;
        }
    }
}

}

Xeon06