Solved – How to adjust confounders in Logistic regression

confoundingdata mininglogisticmultivariate analysisstatistical significance

I have a binary classification problem where I apply logistic regression.

I have a set of features that are found significant.

But I understand that Logistic regression doesn't consider feature interactions

While I read online that, it can be accounted by adjusting logistic regression for con-founders

Currently I did this and got the significant features

model = sm.Logit(y_train, X_train)
result=model.fit()
result.summary()

But how do I adjust for confounders? any pointers to tutorial for non-stat person like me would really be helpful

Is there any python package that can help us do this?

Can someone let me know on how can I do this?

Best Answer

A non-statistically solid rule used in epidemiology is called "The 10% rule". It states that when the Odds Ratio (OR) changes by 10% or more upon including a confounder in your model, the confounder must be controlled for by leaving it in the model. If a 10% change in OR is not observed, you can remove the variable from your model, as it does not need to be controlled for.

EDIT:

What I think you are asking is:

1.) How to adjust for confounding via analysis

2.) If interaction is different than confounding.

An important thing to understand about confounding is that it is generally on a dataset by dataset basis. This works with the 10% rule. Essentially, if the OR of your exposure/outcome relationship does not change by 10% or more after adding the third variable into the model, there is not good enough evidence of confounding to keep it in the model. This is the case even if there is evidence in other literature that the third variable may be a confounder. You could hypothetically put in your model one exposure and 10 extra variables, but if none of the variables change the exposure/outcome relationship (OR), then you should not keep them in the model. Although leaving the variables in the model will control for them, but if it does not change your exposure/outcome relationship, it is unnecessary and will only cloud your conclusions. This article gives some other methods for addressing confounding, including the 10% rule (Just click on view PDF): Hernan 2002

Interaction differs from confounding in that it your exposure/outcome relationship is different on different levels of a third variable. Essentially, rather than the third variable influencing the OR like a confounder does, the third variable will have different ORs for different categories. An easy example is using gender (Male/Female). Say you are interested in the effect of cigarette smoking on cancer. You conclude that smoking gives an OR of 3.0 for cancer. However, you wish to examine if age is an interactive term. After entering the interactive term into your model, you find that the OR for cigarette smoking and cancer among males is 2.0 and among females is 4.0. In this case, reporting an OR of 3.0 would be misleading, as there is a clear difference between males in females.

Please let me know if this has clarified any issues you have.