I'm wondering if there are any examples atomic examples out there for streaming audio FROM the iPhone to a server. I'm not interested in telephony or SIP style solutions, just a simple socket stream to send an audio clip, in .wav format, as it is being recorded. I haven't had much luck with the google or other obvious avenues, although there seem to be many examples of doing this the other way around.
i cant figure out how to register the unregistered account i initially posted with.
anyway, I'm not really interested in the audio format at present, just the streaming aspect. i want to take the microphone input, and stream it from the iphone to a server. i dont presently care about the transfer rate as ill initially just test from a wifi connection, not the 3g setup. the reason i cant cache it is because im interested in trying out some open source speech recognition stuffs for my undergraduate thesis. caching and then sending the recording is possible but then it takes considerably longer to get the voice data to the server. if i can start sending the data as soon as i start recording, then the response time is considerably improved because most of the data will have already reached the server by the time i let go of the record button. furthermore, if i can get this streaming functionality to work from the iphone then on the server side of things i can also start the speech recognizer as soon as the first bit of audio comes through. again this should considerably speech up the final amount of time that the transaction takes from the user perspective.
colin barrett mentions the phones and phone networks, but these are actually a pretty suboptimal solution for asr, mainly because they provide no good way to recover from errors - doing so over a voip dialogue is a horrible experience. however, the iphone and in particular the touch screen provide a great way to do that, through use of an ime or nbest lists for the other recognition candidates.
if i can figure out the basic architecture for streaming the audio, then i can start thinking about doing flac encoding or something to reduce the required transfer rate. maybe even feature extraction, although that limits the later ability to retrain the system with the recordings.