views:

200

answers:

4

Hi, I've been working on a tool to transcribe recordings of speech with Javascript. Basically I'm hooking up key events to play, pause, and loop a file read in with the audio tag.

There are a number of advanced existing desktop apps for doing this sort of thing (such as Transcriber -- here's a screenshot). Most transcription tools have a built-in waveform that can be used to jump around the audio file, which is very helpful because the transcriber can learn to visually find and repeat or loop phrases.

I'm wondering if it's possible to emulate a subset of this functionality in the browser, with Javascript. I don't know much about signal processing, perhaps it's not even feasible.

But what I envision is Javascript reading the sound stream from the file, and periodically sampling the amplitude. If the amplitude is very low for longer than a certain threshhold of time, then that would be labled as a phrase break.

Such labeling, I think, would be very useful for transcription. I could then set up key commands to jump to the previous period of silence. So hypothetically (imagining a jQuery-based API):

var audio = $('audio#someid');

var silences = silenceFindingVoodoo(audio);

silences will then contain a list of times, so I can hook up some way to let the user jump around through the various silences, and then set the currentTime to a chosen value, and play it.

Is it even conceivable to do this sort of thing with Javascript?

+1  A: 

As far as I know, JavaScript is not powerful enough to do this.

You'll have to resort to flash, or some sort of server side processing to do this.

With the HTML5 audio/video tags, you might be able to trick the page into doing something like this. You could (hypothetically) identify silences server-side and send the timestamps of those silences to the client as meta data in the page (hidden fields or something) and then use that to allow JavaScript to identify those spots in the audio file.

Dan Herbert
+1  A: 

If you use WebWorker threads you may be able to do this in Javascript, but that would require using more threads in the browser to do this. You could break up the problem into multiple threads and process it, but, it would be all but impossible to synchronize this with the playback. So, Javascript can determine the silent periods, by doing some audio processing, but since you can't link that to the playback well it would not be the best choice.

But, if you wanted to show the waveforms to the user then javascript and canvas can be used for this, but then see the next paragraph for the streaming.

Your best bet would be to have the server stream the audio and it can do the processing and find all the silences. Each of these should then be saved in a separate file, so that you can easily jump between the silences, and by streaming, your server app can determine when to load up the new file, so there isn't a break.

James Black
+1  A: 

I don't think JavaScript is the tool you want to use to use for processing those audio files - that's asking for trouble. However, javascript could easily read a corresponding XML file which describes where those silences occur in the audio file, adjusting the user interface appropriately. Then, the question is what do you use to generate those XML files:

  1. You can do it manually if you need to demo the capability right away. (Use audacity to see where those audio envelopes occur)

  2. Check out this CodeProject article, which creates a wav processing library in C#. The author has created a function to extract silence from the input file. Probably a good place to start hacking.

Just two of my initial thoughts ... There are ALOT of audio processing APIs out there, but they are written for particular frameworks and application programming languages. Definitely make use of them before trying to write something from scratch ... unless you happen to really love fourier transforms.

Matt
+2  A: 

I think this is possible using javascript (although maybe not advisable, of course). This article:

https://developer.mozilla.org/En/Using_XMLHttpRequest#Handling_binary_data

... discusses how to access files as binary data, and once you have the audio file as binary data you could do whatever you like with it (I guess, anyway - I'm not real strong with javascript). With audio files in WAV format, this would be a trivial exercise, since the data is already organized by samples in the time domain. With audio files in a compressed format (like MP3), transforming the compressed data back into time-domain samples would be so insanely difficult to do in javascript that I would found a religion around you if you managed to do it successfully.

Update: after reading your question again, I realized that it might actually be possible to do what you're discussing in javascript, even if the files are in MP3 format and not WAV format. As I understand your question, you're actually just looking to locate points of silence within the audio stream, as opposed to actually stripping out the silent stretches.

To locate the silent stretches, you wouldn't necessarily need to convert the frequency-domain data of an MP3 file back into the time-domain of a WAV file. In fact, identifying quiet stretches in audio can actually be done more reliably in the frequency domain than in the time domain. Quiet stretches tend to have a distinctively flat frequency response graph, whereas in the time domain the peak amplitudes of audible speech are sometimes not much higher than the peaks of background noise, especially if auto-leveling is occurring.

Analyzing an MP3 file in javascript would be significantly easier if the file were CBR (constant bit rate) instead of VBR (variable bit rate).

MusiGenesis
pat