Solved – glm returns NA as coefficient for logistic regression

generalized linear modellogisticmissing dataregressionregression coefficients

I am fitting a logistic regression for the response variable- 0 or 1. There are 15 explanatory variables- 10 are continuous and 5 are categorical with 3 levels each. I checked collinearity among the 10 continuous variables using correlation and they are okay. Using R, the glm function returns NA as the coefficient for one of the level of a categorical variable.

How can I fix this problem?

Please help.

Best Answer

This problem often indicates that you have a singular design matrix $X$. You can check that by seeing whether the rank of the cross-product $X^\top X$ equals the number of the columns of $X$.

This can easily be performed in R using

ncol(X) == qr(X)$rank

Here is an R-example with some simulated data

N <- 10
x <- rnorm(N)
z <- sample(c(1,2,3),N,replace=TRUE)
y <- sample(c(0,1),N,replace=TRUE)
data <- data.frame(y=y,x=x,z=as.factor(z))
model<-glm(y~x+z,data=data,family="binomial")
summary(model)

# Get model matrix ...
X <- model.matrix(~x+z,data=data)

# Get rank of model matrix
qr(X)$rank

# Get number of parameters of the model = number of columns of model matrix
ncol(X)

# See if model matrix has full rank
ncol(X) == qr(X)$rank
Related Question