Solved – Interpretation of insignificant predictors in logistic regression model

logisticrstepwise regression

First I should explain what I did, and it might not be right.
I have a variable that represents a test outcome, it might be positive or negative.

  • I have a set of observations of one important variable (my point of interest) for the last 5 days before the test was undertaken. I have computed the average for the last 3 days and all the 5 days.

  • Then, I have some other (not so important to me) variables, some of them are binary (yes or no), some of them are continuous.

I want to create a logit model for my DV based on these variables.

As the histograms dont seem normal enough, I have used Mann-Whitney-U on all the variables (with the outcome of the test as a grouping variable) and have seen that the test is the most significant for the 5-day average (the second most significant was the day 1 before the test was undertaken). So, I have chosen the 5-day average for an univariate logit model, and it was significant. Then I put all the other variables into the model and ran a stepwise model selection in R based on AIC. Now I have a model that contains 5 variables – the 5 day average is highly significant and two other variables, still significant, but there are also two variables that are non-significant at my chosen level (0.05). I have two questions:

  1. Is the process of selecting one version of the "important" variable for the model acceptable as I did it?
  2. How do I interpret the two non-significant variables? They have not been shown to have a significant influence on the DV, but in a model without them, the other variables become insignificant. Can I actually use a model that contains non-significant values? Or can I say, that the model fits the available data well, and the two variables contribute to the model but are not significant at my chosen level.

Best Answer

It is not appropriate to do pre-testing of variables to select which variables should be modeled. Instead use subject matter expertise or possibly data reduction (blinded to $Y$). There is absolutely nothing wrong with having "insignificant" variables in a model and in fact this is a sign that you are doing things correctly.

Related Question