views:

420

answers:

4

I'm new to this field - but I need to perform a WAV-to-MIDI conversion in java. Is there a way to know what exactly are the steps involved in WAV-to-MIDI conversion? I have a very rough idea as in you need to; sample the wav file, filter it, use FFT for spectral analysis, feature extraction and then write the extracted features on to MIDI. But I cannot find solid sources or papers as in how to do all that? Can some one give me clues as in how and where to start? Are there any Open Source APIs available for this WAV-to-MIDI conversion process?

Advance thanks

+9  A: 

It's a more involved process than you might imagine.

This research problem is often referred to as music transcription: the act of converting a low-level representation of music (e.g., waveform) into a higher-level representation such as MIDI or even sheet music.

The sophistication of your solution will depend upon the complexity of your input data. Tons of research papers address music transcription only on monophonic piano or drums... because they are easy to transcribe. (Relatively.) Violin is harder. Voice is even harder. Violin plus voice plus piano is much harder. A symphony is nearly impossible. You get the picture.

The basic elements of music transcription involve any of the following overlapping areas:

  1. (multi)pitch estimation
  2. instrument recognition, timbral modeling
  3. rhythm detection
  4. note onset/offset detection
  5. form/structure modeling

Search for papers on "music transcription" on Google Scholar or from the ISMIR proceedings: http://www.ismir.net. If you are more interested in one of the above subtopics, I can point you further. Good luck.

EDIT: That being said, there are existing solutions that we can all find on the web. Feel free to try them. But as you do, evaluate them with a critical eye and ear. What types of audio signals would cause transcription to fail?

EDIT 2: Ah, you are only doing this for piano. Okay, this is doable. Music transcription has advanced to the point where it can transcribe monophonic piano pretty well. A Rachmaninov concerto will still pose problems.

Our recommendations depend upon your end goal. You state "need to perform... in Java." So it sounds like you just want something to work regardless of how it gets you there. In that case, I agree 100% with others: use something that exists.

That's actually an interesting question; all of the MIR libraries I know are typically C/C++/Python/Matlab. But not Java. The EchoNest has a Java API, but I don't think it does note-level transcription. http://developer.echonest.com. I'll let others help you here.

Oh, Marsyas is Java-based. Cool. I thought it was just C++. http://marsyas.info/ I recommend this. It's developed by George Tzanetakis, a professor in MIR. It does signal-level analysis and should be a good option.

Now, if this is for a fun learning experience, I think you can use the sound manipulation utilities in Java to experiment with the WAV signal and see what comes out.

Steve
Thanks for advices Steve. I'm planning on wav-to-midi only for piano performance. I thought music transcription is only generating a digital music score-thanks for correcting me. I need to record a piano playing of a music piece and generate a midi file out of it (i.e.write midi file from wav file),capturing most of the musical features from pitch, dynamics, timing, rhythm, phrasings, tones, articulation etc to be used for processing. Capturing these directly from midi is possible, but from wav-I can't think from where to start. I'll have to research on the areas you said first - thanks Steve
Dolphin
Do I have to use Matlab for this process and integrate with say Java? Advance thanks Steve. You really painted the bigger picture.
Dolphin
You are welcome. See my response to comments in original post.
Steve
+2  A: 

This is a very big undertaking for being new to the field, unless you mean you are familiar with signal analysis and feature detection in general and want to look more specifically into automatic transcription.

There is no API for WAV to MIDI conversion. Vamp is a framework for feature extraction plugins, but to do automatic transcription you would need to use all the functionality of the existing plugins, plus implement functionality that exists in none of them yet.

Browse through the descriptions of the plugins on the vamp download page, any descriptions you do not understand are topics you should start researching if you want to do this.

Justin Smith
Thanks for the suggestions Smith. At least now I know that it's no point of looking for APIs that does that. Can you use plug-ins with say java code? Feature extraction - do I have to use algorithms? Just because I know an algorithm how can I put it down to code? Will I have to use Matlab and integrate with a language (say Java)? Can you please give me a picture? Thanks again Smith
Dolphin
Vamp uses C, C++. Aren't there any Java plugins and such resources? That would be most useful. How can you use plug-ins in java code? Advance thanks
Dolphin
Java is not seen so often in signal processing / analysis because it is relatively CPU intensive, and until fairly recently hardware was not fast enough to do this sort of thing in Java at acceptable speeds. It looks like Steve found a good lead though. For a higher level approach there is also the possibility of using an environment like csound http://csounds.com/ which provides a huge number of tools for synthesizing and analyzing sound with less worry about explicit memory allocation and freeing than you would get with C.
Justin Smith
Regarding how to use plugins in Java, a plugin is a dynamically loaded library, you can use JNI to load and access it in a Java program: http://en.wikipedia.org/wiki/Java_Native_Interface
Justin Smith
+2  A: 

If you don't need to automate this task (ie, for a website where people can upload MP3's and get MIDI files back), then you should consider using a tool like Melodyne which is already quite good at going this. As Steve noted, this is a very difficult task to accomplish, and even the best algorithms and solutions present at the moment are not 100% reliable.

So if you are just doing studio work and need to do a few conversions, it will probably save you a bit of time (and lots of headache) to use a tool already designed for this task.

Nik Reiman
A: 

Dolphin, sorry to be brusque, but you have completely underestimated the problem. What you want to achieve - a full piano sound transcription involving all parameters that were used while playing would need an enormous amount of research with people who have worked in the field for many years. Even a group of PhDs in signal processing would have to invest a lot of work to even come close to what you mean. Music transcription has needed decades of work to even work halfway reliable. I'd suggest you pick a different problem which you can manage better than this.

Thorsten79