I am currently trying to implement basic speech recognition in AS3. I need this to be completely client side, as such I can't access powerful server-side speech recognition tools. The idea I had was to detect syllables in a word, and use that to determine the word spoken. I am aware that this will grealty limit the capacities for recognition, but I only need to recognize a few key words and I can make sure they all have a different number of syllables.
I am currently able to generate a 1D array of voice level for a spoken word, and I can clearly see, if I somehow draw it, that there are distinct peaks for the syllables in most of the cases. However, I am completely stuck as to how I would find out those peaks. I only really need the count, but I suppose that comes with finding them. At first I thought of grabbing a few maximum values and comparing them with the average of values but I had forgot about that peak that is bigger than the others and as such, all my "peaks" were located on one actual peak.
I stumbled onto some Matlab code that looks almost too short to be true, but I can't very that as I am unable to convert it to any language I know. I tried AS3 and C#. So I am wondering if you guys could start me on the right path or had any pseudo-code for peak detection?