Regression Analysis – Why Adding a Linear Regression Predictor Decreases R Squared

linearr-squaredregression

My dataset ($N \approx 10,000$) has a dependent variable (DV), five independent "baseline" variables (P1, P2, P3, P4, P5) and one independent variable of interest (Q).

I have run OLS linear regressions for the following two models:

DV ~ 1 + P1 + P2 + P3 + P4 + P5
                                  -> R-squared = 0.125

DV ~ 1 + P1 + P2 + P3 + P4 + P5 + Q
                                  -> R-squared = 0.124

I.e., adding the predictor Q has decreased the amount of variance explained in the linear model. As far as I understand, this shouldn't happen.

To be clear, these are R-squared values and not adjusted R-squared values.

I've verified the R-squared values using Jasp and Python's statsmodels.

Is there any reason I could be seeing this phenomenon? Perhaps something relating to the OLS method?

Best Answer

Could it be that you have missing values in Q that are getting auto-dropped? That'd have implications on the sample, making the two regressions not comparable.