views:

233

answers:

3

Given a 1D array of values, what is the simplest way to figure out what the best fit bimodal distribution to it is, where each 'mode' is a normal distribution? Or in other words, how can you find the combination of two normal distributions that bests reproduces the 1D array of values?

Specifically, I'm interested in implementing this in python, but answers don't have to be language specific.

Thanks!

A: 

I suggest using the awesome scipy package. It provides a few methods for optimisation.

There's a big fat caveat with simply applying a pre-defined least square fit or something along those lines.

Here are a few problems you will run into:

  1. Noise larger than second/both peaks.
  2. Partial peak - your data is cut of at one of the borders.
  3. Sampling - width of peaks are smaller than your sampled data.
  4. It isn't normal - you'll get some result ...
  5. Overlap - If peaks overlap you'll find that often one peak is fitted correctly but the second will apporach zero...
phoku
A: 

I would check out this SO Question and this Wikipedia Article.

SciPy has a few ways to fit and optimize data.

Nick Presta
Your links seem a bit messed up, SO question and wikipedia article link to the question, scipy links to the wikipedia article and there's no scipy link. :-)
wds
Thanks to Andre Miller for fixing my links. :-)
Nick Presta
+1  A: 

What you are trying to do is called a Gaussian Mixture model. The standard approach to solving this is using Expectation Maximization, scipy svn includes a section on machine learning and em called scikits. I use it a a fair bit.

whatnick