views:

1948

answers:

6

I am developing a system as an aid to musicians performing transcription. The aim is to perform automatic music transcription (it does not have to be perfect, as the user will correct glitches / mistakes later) on a single instrument monophonic recording. Does anyone here have experience in automatic music transcription? Or digital signal processing in general? Help from anyone is greatly appreciated no matter what your background.

So far I have investigated the use of the Fast Fourier Transform for pitch detection, and a number of tests in both MATLAB and my own Java test programs have shown it to be fast and accurate enough for my needs. Another element of the task that will need to be tackled is the display of the produced MIDI data in sheet music form, but this is something I am not concerned with right now.

In brief, what I am looking for is a good method for note onset detection, i.e. the position in the signal where a new note begins. As slow onsets can be quite difficult to detect properly, I will initially be using the system with piano recordings. This is also partially due to the fact I play piano and should be in a better position to obtain suitable recordings for testing. As stated above, early versions of this system will be used for simple monophonic recordings, possibly progressing later to more complex input depending on progress made in the coming weeks.

Thanks, Alan

+2  A: 

I'd like to know how you do it.

Maybe these would help, just basic Google search results.

1 TECHNIQUES FOR AUTOMATIC MUSIC TRANSCRIPTION Juan Pablo Bello ...
Onset Detection, Music Transcription and Ornamentation Detection ...

That second one says the problem of finding onsets is difficult especially for slow onsets. (Section 2, page 2.)

waynecolvin
I forgot to mention the slow onset problem in my question, I've updated it now. I will initially be focusing on instruments with a hard onset. Thanks for the links.
Alan
+2  A: 

What you want to do is often called WAV-to-MIDI (google "wav-to-midi"). There have been many attempts at this process, with varying results (note onset is one of the difficulties; polyphony is much harder to deal with). I'd recommend starting with a thorough search of the off-the-shelf solutions, and only start work on your own if there's nothing acceptable out there.

The other part of the process you'd need is something to render the MIDI output as a traditional musical score, but there are umpteen billion products that do that.

Another answer is: yes, I've done a lot of digital signal processing (see the software on my website - it's an infinite-voice software synthesizer written in VB and C), and I'm interested in helping you with this problem. The WAV-to-MIDI part isn't really that difficult conceptually, it's just making it work reliably in practice that's hard. Note onset is just setting a threshold - errors can be easily adjusted forward or backward in time to compensate for note attack differences. Pitch detection is much easier to do on a recording than it is to do in real time, and involves just implementing an auto-correlation routine.

MusiGenesis
Thanks for the reply. Most of the off-the-shelf solutions I have found are not very good, often with accuracy below 60% even for simple recordings. Besides, this is for my undergrad thesis so simply taking an off-the-shelf solution isn't an option.I'll update my question with more info now.
Alan
+19  A: 
MusiGenesis
That's a very detailed response, thanks :) I'll have to go through it again to make sure I haven't missed anything, and get back to you with any questions.
Alan
is the compression you're talking about Dynamic Range Compression?
Alan
@Alan: essentially yes, although you can do non-dynamic range compression as well. Most WAV editors label this effect as "Dynamic Compression", probably to avoid confusion with file-size compression.
MusiGenesis
Thanks. Can you perhaps point me towards an algorithm to achieve either dynamic or non-dynamic range compression? So far all I have been able to find is circuit diagrams for the loudness feature in many amplifiers.
Alan
I'll post one of my own in a second (the effect it produces sounds horrible, but it may work for this purpose). I have also never found code for a dynamic range compressor. I think 99% of DSP work of this type is realtime (as opposed to full-buffer processing).
MusiGenesis
If you find some code you owe me. :)
MusiGenesis
Just added the compression method.
MusiGenesis
Looks good thanks, it's actually a lot simpler than I expected. Having said that I'm sure I'll run into major problems once I start putting everything together.
Alan
It's easy to understand and impossible to solve using the algorithmic approach. If you continue to graduate school (or whatever they call it over there) we can set you up working on the real solution.
MusiGenesis
I understand that this approach will not yield a perfect result, however I believe it will be good enough for the application. I don't particularly want to start implementing a system using neural nets or genetic algorithms right now, maybe in the future... who knows :)
Alan
At least you know what the real solution is. Good luck, and I would be interested in seeing what you come up with. I know FFT in principle but I've never worked with it hands-on, and I'm especially interested in learning how to use it for pitch detection.
MusiGenesis
Would it be ok to contact you on the email on your site?
Alan
Sure. Just use kennethadams@ instead of support@ or feedback@.
MusiGenesis
the method you gave for compression seems to simply double the amplitude of the signal when I provide param = 2. Should this only be applied to values exceding a certain threshold?
Alan
That code is pseudo-code. "POW(1.0 - norm, param)" means "take (1.0 - norm) to the power of param" or " (1.0 - param) ^ param ". I don't know what the power method is in Java's math library. Also make sure all your casts are correct.
MusiGenesis
+1 just for the sheer amount of work put into the answer.
flodin
Thanks, flodin. This answer is why I don't feel guilty about getting a gold badge for my Dilbert knockoff. :)
MusiGenesis
A: 

this library is centered around audio labeling:

aubio

aubio is a library for audio labelling. Its features include segmenting a sound file before each of its attacks, performing pitch detection, tapping the beat and producing midi streams from live audio. The name aubio comes from 'audio' with a typo: several transcription errors are likely to be found in the results too.

and I have had good luck with it for onset detection and pitch detection. It's in c, but there is swig/python wrappers.

also, the author of the library has a pdf of his thesis on the page, which has great info and background about labeling.

A: 

You could try to transform the wav signal into a graph of amplitude against time. Then a way to determine a consistent onset is to calculate the intersection of a tangent in the inflection point of the rising flank of a signal with the x axis.

Svante
+1  A: 

You should look at MIRToolbox - it is written for Matlab, and has an onset detector built in - it works pretty well. The source code is GPL'd, so you can implement the algorithm in whatever language works for you. What language is your production code going to use?

Jason Sundram
Thanks for the link Jason, I'll check it out now. I'm just using MATLAB for some quick tests / investigations into various methods for the different elements of the complete system. The production system will likely be written in Java, taking advantage of javax.sound.*
Alan