views:

2743

answers:

3

I'm starting to create a proof of concept for an idea I have, and at this point, I need some guidance as to how I should begin.

I need to sample the microphone input, and process that signal in real-time (think Auto-Tune, but working live), as opposed to "recording" for a while.

What I'm doing is "kind of" a "mic input to MIDI converter", so it needs to respond quite fast.

I investigated a bit online, and apparently the way to go is either DirectSound or the WaveIn* API functions. Now, according to what I read, the WaveIn APIs will let me fill a buffer of a certain size, which is fine for recording and post-processing, but I'm wondering... How do I do real-time processing?

Do I use 10ms buffers and keep a circular 50ms or 100ms array myself, and I get a function that triggers the analysis every 10ms? (which has access to the latest 100ms of input, of which only 10ms are new)

Am I missing something here?

Also, how is this done with DirectSound? Does it give me any improved capabilities over the regular Win32 APIs?

+2  A: 

Here is a link to a program (with source) in C++ that does real time frequency analysis.

Bork Blatt
+5  A: 

Both DirectSound and the Wave API ultimately give you buffers filled with audio data that you can process. The size of these buffers can be varied, but realistically you will need to keep latency to less then 10mS for useful realtime processing. This means processing data within 10mS of it arriving at the buffer, minus the time between it arriving at the audio hardware and getting to the buffer, which will depend on the driver. For this reason I would recommend processing no more than 5mS of data at a time.

The main architectural difference between the two is that with DirectSound you allocate a circular buffer which is then filled by the DirectSound audio driver whereas the Wave API takes a queue of pre-allocated WAVEHDR buffers which are filled, returned to the app and then recycled. There are various notification methods for both APIs, such as window messages or events. However, for low-latency processing it's probably advisable to maintain a dedicated streaming thread and wait for new data to arrive.

For various reasons I would recommend DirectSound over the Wave API for new development - it will certainly be easier to achieve lower latency.

Whichever method you choose to do the capturing, once you have your data you simply pass it to your processing algorithm and wait for the next buffer to be ready. As long as you can process the data faster than it arrives then you'll have your (pseudo) real time analysis.

There are also alternative APIs that may be more suitable. Have a look at ASIO, Kernel Streaming (for XP only - I wouldn't bother) and, new in Vista, the Core Audio APIs.

Stu Mackellar
Good explanation, but I'm not sure why you think DirectSound is better than waveIn* for latency. With both approaches, the latency is purely a function of how long you record into a buffer before processing it. However, I would also recommend DirectSound given that it's a more modern API. I can't believe waveIn* and waveOut* are even still around (and they're even available in Windows Mobile, which blew my mind when I discovered it).
MusiGenesis
As I understand it, the reason for DirectSound having a potentially lower latency is that it is able to do a direct DMA copy to the user's buffer - the Wave API does not do this and requires another copy in between. Also, when using the Wave API you can't take exclusive control of the hardware which may mean that kmixer starts doing sample rate conversion or bit depth conversion. All of this extra processing adds up but, granted, it may not be significant compared to the inherent buffering latency. These factors may also change with the OS version.
Stu Mackellar
A: 

Did u manage to carry out this task? does anyone have code for this project????

many thanks

Ahmed