views:

84

answers:

1

As from title, I have some data that is roughly binormally distributed and I would like to find its two underlying components.

I am fitting to the data distribution the sum of two normal with means m1 and m2 and standard deviations s1 and s2. The two gaussians are scaled by a weight factor such that w1+w2 = 1

I can succeed to do this using the vglm function of the VGAM package such as:

    fitRes <- vglm(mydata ~ 1, mix2normal1(equalsd=FALSE), 
          iphi=w, imu=m1, imu2=m2, isd1=s1, isd2=s2))

This is painfully slow and it can take several minutes depending on the data, but I can live with that.

Now I would like to see how the distribution of my data changes over time, so essentially I break up my data in a few (30-50) blocks and repeat the fit process for each of those.

So, here are the questions:

1) how do I speed up the fit process? I tried to use nls or mle that look much faster but mostly failed to get good fit (but succeeded in getting all the possible errors these function could throw on me). Also is not clear to me how to impose limits with those functions (w in [0;1] and w1+w2=1)

2) how do I automagically choose some good starting parameters (I know this is a $1 million question but you'll never know, maybe someone has the answer)? Right now I have a little interface that allow me to choose the parameters and visually see what the initial distribution would look like which is very cool, but I would like to do it automatically for this task.

I thought of relying on the x corresponding to the 3rd and 4th quartiles of the y as starting parameters for the two mean? Do you thing that would be a reasonable thing to do?

+1  A: 

First things first:

There has been a lot of research into mixture models so you may find something.

Dirk Eddelbuettel
Hi Dirk, unfortunately I know that the problem of mixture models is far from trivial... the second link seems very interesting. Would you suggest some specific packages to try among those? Thanks nico
nico