Solved – Continuous variable has very large odds ratio in binary logistic regression

binary datalogisticodds-ratioregression

Any help would be very much appreciated. I have run a binary logistic regression model. My dependent variable is impaired or not-impaired (following stroke), I have two predictors one is age the other is an EEG variable called theta power. Both predictors are continuous variables. I have checked the assumptions and there is no indication for multicolinearity (both variables VIF=1.154, tolerance =0.866).In addition there is no indication that the model violates the assumption of linearity (I ran a check using linear regression).

My results show that the model is significant (Chi-square=13.779, df=2, sig=0.001). The sample is not large, only 31, however I only included two predictors with this in mind.

One of my predictors has an enormous Exp(B) value and confidence intervals to go along with it.

Age: B=0.084, S.E.=0.046, Wald=3.364, df=1, sig=0.067, Exp(B)=1.087, 95%C.I. for Exp(B)=0.994-1.189

ThetaPower: B=16.259, S.E.=8.019, Wald=4.112, df=1, sig=0.043, Exp(B)=11516574.29, 95%C.I. for Exp(B)=1.721-7.705+E13

There is only a very small variance in theta values (0.012) compared to age (202.034), could this be the reason for the extremely large odds ratio and confidence intervals? Is there anyway I can 'fix' this to keep this variable in my model and report the statistics in a more meaningful way?

My data are includedenter image description here now

Best Answer

After reading Scortchi and Ben's comments, I think I may have found a solution for you.

I think the problem is your scale of predictor. You know a regression coefficient represents the change in Y (outcome variable) relative to a one unit change in the respective independent variable. For your logistic regression your Y is probability of impaired.

For age, it is 1 years change to affect your probability to be impaired.

For your Theta, it is also 1 units change, but attention, your values are 0.1, 0.2,..., so the value "1" might not in a reasonable rage of your measurement. The same as a human can not live for 1000 years, if your unit is 1000 years, then the coefficient will be huge.

I think you may either divide yoru coefficient by 10 directly or may

change your theta power's unit, I don't know what it is.

Such as multiple your theta values by 10 then the results seem more reasonable.

age<-c(77,84,45,47,72,61,78,49,79,77,74,54,65,52,80)
  theta<-10*c(0.117,0.443,0.136,0.285,0.107,0.113,0.263,0.146,0.182,0.299,0.148,0.097,0.091,0.151,0.302)
impaired<-c(0,1,1,1,1,NA,1,0,1,1,1,0,0,0,NA)
mydata<-data.frame(cbind(age,theta,impaired))

logistic <- glm(impaired ~ age+theta, data = mydata, family = "binomial")
summary(logistic)

The results seem much better, but need to be understood according to your new unit.

enter image description here