Regression – How to Check Confounding and Mediation in Large Dataset

confoundingepidemiologymediationmultiple regressionregression

Given a large dataset, one cannot possibly check every model. In particular, it does not seem clear to me that one can check confounding or mediation in either cases.

How does one check confounding/mediation in large data context? I am no epidemiologist. Does the question even make sense in tons spurious relationship due to large data? If there are 5 variables, there might be hope. I could not see hope in data with 100 variables.

Best Answer

The size of the dataset only helps a little. The only way I know to check confounding is to convene a large number of experts in the subject matter area and in operations and procedures related to subject matter and to have them list all of the factors they feel may be used in the decision making process related to the exposure or treatment you are interested in. Then see if those factors (or ones highly correlated with them) happen to be collected in your dataset and are measured accurately enough without too many missing values.

Related Question