With survival analysis, the hazard function is the instantaneous death rate.
In these analyses, you are typically measuring what effect something has on this hazard function. For example, you may ask "does swallowing arsenic increase the rate at which people die?". A background hazard is the level at which people would die anyway (without swallowing arsenic, in this case).
If you read the docs for coxphfit
carefully, you will notice that that function tries to calculate the baseline hazard; it is not something that you enter.
baseline The X values at which to
compute the baseline hazard.
EDIT: MATLAB's coxphfit
function doesn't obviously work with grouped data. If you are happy to switch to R, then the anaylsis is a one-liner.
library(survival)
#Create some data
n <- 20;
dfr <- data.frame(
survdays = runif(n, 5, 15),
cens = runif(n) < .3,
x = rlnorm(n),
groups = rep(c("first", "second"), each = n / 2)
)
#The Cox ph analysis
summary(coxph(Surv(survdays, cens) ~ x / groups, dfr))
ANOTHER EDIT: That baseline
parameter to MATLAB's coxphfit
appears to be a normalising constant. R's coxph
function doesn't have an equivalent parameter. I looked in Statistical Computing by Michael Crawley and it seems to suggest that the baseline hazard isn't important, since it cancels out when you calculate the likelihood of your individual dying. See Chapter 33, and p615-616 in particular. My knowledge of how the model works isn't deep enough to explain the discrepancy in the MATLAB and R implementations; perhaps you could ask on the Stack Exchange Stats Analysis site.