tags:

views:

109

answers:

2

Hello. I recently posted this to reddit and it was suggested I come here, so here I am.

I'm a student working on an epidemiology model in R, using maximum likelihood methods. I created my negative log likelihood function. It's sort of gross looking, but here it is:

NLLdiff = function(v1, CV1, v2, CV2, st1 = (czI01 - czV01), st2 = (czI02 - czV02), st01 = czI01, st02 = czI02, tt1 = czT01, tt2 = czT02) { 
prob1 = (1 + v1 * CV1 * tt1)^(-1/CV1)
prob2 = ( 1 + v2 * CV2 * tt2)^(-1/CV2) 
-(sum(dbinom(st1, st01, prob1, log = T)) + sum(dbinom(st2, st02, prob2, log = T)))
 }

The reason the first line looks so awful is because most of the data it takes is inputted there. czI01, for example, is already declared. I did this simply so that my later calls to the function don't all have to have awful vectors in them.

I then optimized for CV1, CV2, v1 and v2 using mle2 (library bbmle). That's also a bit gross looking, and looks like:

ml.cz.diff = mle2 (NLLdiff, start=list(v1 = vguess, CV1 = cguess, v2 = vguess, CV2 = cguess), method="L-BFGS-B", lower = 0.0001)

Now, everything works fine up until here. ml.cz.diff gives me values that I can turn into a plot that reasonably fits my data. I also have several different models, and can get AICc values to compare them. However, when I try to get confidence intervals around v1, CV1, v2 and CV2 I have problems. Basically, I get a negative bound on CV1, which is impossible as it actually represents a square number in the biological model as well as some warnings. The warnings are this: http://i.imgur.com/B3H2l.png . Is there a better way to get confidence intervals? Or, really, a way to get confidence intervals that make sense here?

What I see happening is that, by coincidence, my hessian matrix is singular for some values in the optimization space. But, since I'm optimizing over 4 variables and don't have overly extensive programming knowledge, I can't come up with a good method of optimization that doesn't rely on the hessian. I have googled the problem - it suggested that my model's bad, but I'm reconstructing some work done before which suggests that my model's really not awful (the plots I make using the ml.cz.diff look like the plots of the original work). I have also read the relevant parts of the manual as well as Bolker's book Ecological Models in R. I have also tried different optimization methods, which resulted in a longer run time but the same errors. The "SANN" method didn't finish running within an hour, so I didn't wait around to see the result.

tl;dr : my confidence intervals are bad, is there a relatively straightforward way to fix them in R.

My vectors are:

czT01 = c(5, 5, 5, 5, 5, 5, 5, 25, 25, 25, 25, 25, 25, 25, 50, 50, 50, 50, 50, 50, 50)
czT02 = c(5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 25, 25, 25, 25, 25, 50, 50, 50, 50, 50, 75, 75, 75, 75, 75)
czI01 = c(25, 24, 22, 22, 26, 23, 25, 25, 25, 23, 25, 18, 21, 24, 22, 23, 25, 23, 25, 25, 25)
czI02 = c(13, 16, 5, 18, 16, 13, 17, 22, 13, 15, 15, 22, 12, 12, 13, 13, 11, 19, 21, 13, 21, 18, 16, 15, 11)
czV01 = c(1, 4, 5, 5, 2, 3, 4, 11, 8, 1, 11, 12, 10, 16, 5, 15, 18, 12, 23, 13, 22)
czV02 = c(0, 3, 1, 5, 1, 6, 3, 4, 7, 12, 2, 8, 8, 5, 3, 6, 4, 6, 11, 5, 11, 1, 13, 9, 7)

and I get my guesses by:

v = -log((c(czI01, czI02) - c(czV01, czV02))/c(czI01, czI02))/c(czT01, czT02)
vguess = mean(v)
cguess = var(v)/vguess^2

It's also possible that I'm doing something else completely wrong, but my results seem reasonable so I haven't caught it.

A: 

Also, when I plot the profile I get is this: http://i.imgur.com/71dVE.png , which explains why the confidence intervals are bad, although not how to fix the problem.

AmalieNot
You should edit the original question with this rather than posting it as an "answer".
Shane
+1  A: 

You could change the parameterization so that the constraints are always satisfied. Rewrite the likelihood as a a function of ln(CV1) and ln(CV2), that way you can be sure that CV1 and CV2 remain strictly positive.

NLLdiff_2 = function(v1, lnCV1, v2, lnCV2, st1 = (czI01 - czV01), st2 = (czI02 - czV02), st01 = czI01, st02 = czI02, tt1 = czT01, tt2 = czT02) { 
prob1 = (1 + v1 * exp(lnCV1) * tt1)^(-1/exp(lnCV1))
prob2 = ( 1 + v2 * exp(lnCV2) * tt2)^(-1/exp(lnCV2)) 
-(sum(dbinom(st1, st01, prob1, log = T)) + sum(dbinom(st2, st02, prob2, log = T)))
 }
fabians