Multivariate Analysis – When Regression is Not Significant Univariately but Significant with Controls

multivariate analysisp-valueregressionstatistical significanceunivariate

I have a data on job type (i.e. dummy variable 1 if bad job and 0 if good job) and amount of debt individuals hold before finding a job. I ran a regression of job type on debt level, and I did not get significance (i.e. p-value was around 0.5). However, when I added the controls (i.e. major while in school, IQ score, etc), I got very significant result on the amount of debt (i.e. p-value less than 0.05). I expected the opposite (i.e. univariate case should be more significant than when I add more controls). Does it mean that I am doing something wrong in the multivariate regression? Or can I say that the amount of debt becomes significant after controlling additional Xs variables? I would really appreciate if anyone can give me advice on this..! Thank you very much!

Best Answer

To answer the question, no I don't think this means you are doing something wrong.

In fact, I think this is a good demonstration of why it's useful to fit many regression models when exploring data. This is one of Gelman and Hill's "ten tips to improve your regression modeling"

Think of a series of models, starting with the too-simple and continuing through to the hopelessly messy. Generally it’s a good idea to start simple. Or start complex if you’d like, but prepare to quickly drop things out and move to the simpler model to help understand what’s going on. Working with simple models is not a research goal—in the problems we work on, we usually find complicated models more believable—but rather a technique to help understand the fitting process.

In your case, the univariate model is almost definitely too simple. It is highly implausible that the only determinant of whether you get a good job is the amount of previous debt. As a result, the coefficient you get is probably affected by omitted variable bias

So rather than saying that debt "becomes significant after adding controls", I might say, "debt is significant in a more plausible model that accounts for other factors that affect the response variable"