views:

48

answers:

1

I know there is COXPHFIT function in MATLAB to do Cox regression, but I have problems understanding how to apply it.

1) How to compare two groups of samples with survival data in days (survdays), censoring (cens) and some predictor value (x)? The groups defined by groups logical variable. Groups have different number of samples.

2) What is the baseline parameter in coxphfit? I did read the docs, but how should I choose the baseline properly?

It would be great if you know a site with good detailed examples on medical survival data. I found only the Mathworks demo that does not even mention coxphfit.

Do you know may be another 3rd party function for Cox regression?

+1  A: 

With survival analysis, the hazard function is the instantaneous death rate.

In these analyses, you are typically measuring what effect something has on this hazard function. For example, you may ask "does swallowing arsenic increase the rate at which people die?". A background hazard is the level at which people would die anyway (without swallowing arsenic, in this case).

If you read the docs for coxphfit carefully, you will notice that that function tries to calculate the baseline hazard; it is not something that you enter.

baseline The X values at which to compute the baseline hazard.

EDIT: MATLAB's coxphfit function doesn't obviously work with grouped data. If you are happy to switch to R, then the anaylsis is a one-liner.

library(survival)

#Create some data
n <- 20; 
dfr <- data.frame(
  survdays = runif(n, 5, 15),
  cens     = runif(n) < .3,
  x        = rlnorm(n),
  groups   = rep(c("first", "second"), each = n / 2)
)

#The Cox ph analysis
summary(coxph(Surv(survdays, cens) ~ x / groups, dfr))

ANOTHER EDIT: That baseline parameter to MATLAB's coxphfit appears to be a normalising constant. R's coxph function doesn't have an equivalent parameter. I looked in Statistical Computing by Michael Crawley and it seems to suggest that the baseline hazard isn't important, since it cancels out when you calculate the likelihood of your individual dying. See Chapter 33, and p615-616 in particular. My knowledge of how the model works isn't deep enough to explain the discrepancy in the MATLAB and R implementations; perhaps you could ask on the Stack Exchange Stats Analysis site.

Richie Cotton
@Richie Cotton: Thanks a lot. Specially for the R script. But as for baseline, it is a parameter in the function. It's mean(X) by default, and can be 0. I think in your example with As, the baseline should be 0, so we compare swallowing hazard vs. not-swallowing. But if I use age as a predictor, I think mean has more sense. How I can control it in the R script?
yuk
Thanks a lot, Richie. After some reading and trying I think I understand it much better now.
yuk
And special thanks for Stats SE site. I didn't know about that.
yuk

related questions