views:

861

answers:

7

Given a music file, is it possible to split out each instrument that is being played? I.e. let's say I have someSong.mp3, and in that song there's vocals, guitar, bass and drums. I'd want to get 4 "tracks" - one for each distinct instrument.

I'm guessing that it's almost impossible to do this, given that instruments can overlap, and it's notoriously difficult to distinguish overlapping voices let alone instruments.

However is there is a library, or an algorithm, or SOME way of doing this, I'd be curious to hear how.

+1  A: 

The easiest way to do this is to maintain the instruments separately in the first place, which is why many intermediate musical processing applications use MIDI to store instrument messages in an abstract form on separate tracks.

mquander
+2  A: 

Every instrument has a characteristic spurious oscillation, so if you isolate single notes (with a DFT/FFT) and compare the other frequencies at that point of time, you might distinguish at least different types of instruments.

tstenner
I don't think you will get very far with only DFT's, but this is the only answer I'd consider as even remotely correct this far. Separating that many instruments is no doubt a really hard signal processing task. You might have some luck picking out different frequencies, but that's it.
kigurai
DFT/FFT has not enough resolution to separate bass-notes unless you make your FFT window *very* large. And it does not do any magic to pick apart the harmonics of different notes played at the same time.
Nils Pipenbrinck
Tough to get much certainty this way, especially with recording of real instruments (as opposed to synthesized signals), but it seems like the cleanest approach if you *must* try to pick a single track apart. Not easy.
dmckee
A: 

mp3 is a lossy format. It works by modeling the acoustic response of a person and throwing away information that it considers that a person can't hear. In essence louder instruments that overshadowing soft instruments - the softer one gets ditched. This means that you never will be able to extract what originally went into the mp3 let alone separate out the instruments.

I have a friend who is a sound engineer and he always has to say "No" to people who ask him to re-engineer a track recorded as an mp3.

A few weeks ago I saw a study that suggested that the younger generation actually preferred the sound of mp3's over more complete formats as that is what they had grown up with.

Peter M
I'd be willing to settle for a non lossless format, like OGG. Essentially: any widely available format.
FreeMemory
Then I'd be suggesting tsttenner's approach. Get the music into the time domain, convert to frequency domain, scan for bands of frequencies and then try and match them up with notes produced by separate instruments. Sounds like a research level project to me!
Peter M
A: 

Long story short: You can't except in the case that your four instruments are synthesizers playing pure sine-waves.

Nils Pipenbrinck
Pure sine waves aren't (theoretically) required. You must know the *linearly independent* distribution of harmonics for each instrument. And you must have enough data on each note, which puts some constraints on how complex and similar the distributions can be.
dmckee
+11  A: 

My undergraduate project dealt with transcribing notes from a WAV file to a MIDI file. We handled only the simple case of one instrument, possibly playing more than one note at a time (a piano, for instance). Our research into the subject before we started showed that even this (i.e. only one instrument) is considered non-trivial. Basically, the problem is:

  • find what frequencies are playing at any given time. This can be done by a DFT/FFT of small windows one at a time.
  • Use some heuristic to guess which frequencies are harmonies of the same note, and which belong to different notes. This may be easy if you know what instrument is playing, but it's hard in the general case, because the magnitudes of each harmony differ by instrument. For instance, you might have two Cs one octave apart from one instrument, or you might have one C but from a different instrument.
  • after you know what notes are playing at each time, you have to guess when you have breaks in the notes. You could have one long note or a series of short notes. Depending on the size of the windows you used for the initial DFT, you could have different results here.

Now, if you have more than one instrument at a time, and no two are playing the same notes or harmonies thereof at one time, you might be able to tell the instruments apart using some heuristic on the magnitudes of the harmonies or on the sequences of notes they're playing. Most likely there will be times when two instruments are playing the same note. Then you don't really have any way to decide if there is (a) one instrument playing the note, (b) two instruments playing at the same volume, (c) one playing soft and the other playing loud or (d) any combination thereof.

Anyway, that's the short list of problems to solve. I don't know of any algorithm that solves this in the general case. I don't think this problem has been solved yet.

Edit: My project presentation can be found at http://www-sipl.technion.ac.il/new/Archive/Special_Events/sipl2004/Projects_PowerPoint/WAV-to-MIDI.pdf

Nathan Fellman
Is real world experience actually allowed here? Huh? HUH?
dmckee
Nice description of the problem, BTW.
dmckee
Thanks for this -- interesting read.
mquander
Thanks for the interesting answer. I'm going to mark it as the accepted answer although it doesn't technically SOLVE the problem, it provides some very interesting food for thought. Thanks! :)
FreeMemory
+1  A: 

basically what you want to do is very difficult to do programmatically although no longer impossible..

http://www.youtube.com/watch?v=jFCjv4_jqAY

Nick
It remains to be seen how effective this is... Celemony have been pretty quiet about it for a while, though it's still promised for “Spring 2009”. I'll be very impressed if it's half as good as the demo, this is a really hard task.
bobince
And it can't actually distinguish instruments: "what it does not do is detect and distinguish the different instruments playing the tones... if a flute and a saxophone play the same tone... a blob will appear representing both the tone played by the flute and that played by the saxophone."
FreeMemory
+2  A: 

I have actually bumped into a very interesting algorithm called ICA (Independent Component Analysis). The concept behind this algorithm doesn't come from the signal processing world, but from probabilistic theories. We used it to separate two songs that were mixed into single mp3 file. You can find an implementation library in Matlab \ C++ \ Python called FastICA here. Give it a shot it's really nice.

LiorH