Solved – Why is the regression insignificant when I merge data that produced two significant regressions

regressionregression coefficientsstatistical significance

Sorry for the confusing title, I think this is a general statistics question, but I'm working in R. I have a combined dataset of two samples from different countries (n=240 and n=1,010), and when I run a linear regression between the same three variables in each dataset, both datasets produce a significant result, with almost identical coefficients. However, when I merge the datasets and run the same regression on the combined dataset, it is no longer significant. Can anyone explain this?

In case it matters, the regression has the form lm(a~b*c).

Best Answer

Without seeing your data, this is difficult to answer definitively. One possibility is that your datasets span different ranges of the independent variable. It is well-known that combining data across different groups can sometimes reverse correlations seen in each group individually. This effect is known as Simpson's Paradox.

Related Question