Hi!
I have been reading the code used by R to fit a generalized linear model (GLM), since the source-code of R is freely available. The algorithm used is called iteratively reweighted least squares (IRLS), which is a fairly documented algorithm. For each iteration, there is a call to a Fortran function to solve the weighted least squares problem.
From the end-user's viewpoint, for a logistic regression for instance, a call in R looks just like this:
y <- rbinom(100, 1, 0.5)
x <- rnorm(100)
glm(y~x, family=binomial)$coefficients
And if you do not want to use an intercept, either of these calls is okay:
glm(y~x-1, family=binomial)$coefficients
glm(y~x+0, family=binomial)$coefficients
However, I cannot manage to understand how the formula, i.e. y~x
or y~x-1
, is making sense in the code and being understood as for whether to use an intercept or not. I was looking for a part of the code where a column of ones would be bound to x
, but it seems there is none.
Thanks.
PS: As far as I have read, the boolean intercept which appears in the function called glm.fit
is not the same as the intercept which I am referring to. And it is the same for the offset.
The documentation about glm
and glm.fit
is here.