Regression – Can R-Squared Be Too Low in Multiple Linear Regression?

r-squaredregression

This is a very general question about R-squared or the coefficient of determination. I found a couple of threads on CV but none that answers my question in a straightforward way.

In short, what is a ‘low’ R-squared when running multiple linear regression? From which minimum value should we conclude that our model does not make better than the baseline?

I sometimes see R-squared values that are as low as 0.15, yet the models are significant. I guess this depends on size, on whether R-squared is used for prediction or inference, etc., however I still do not have a good intuition for it.

It also seems to me that in the ‘hard’ sciences, R-squared tend to be high (say, 0.8 or higher in classic cases), whereas in the social sciences, from what I can see, it tends to be lower (say, under 0.5). I know this might be a gross generalization, however.

Any thoughts much appreciated.

Best Answer

Consider what $R^2$ means: proportion of variability explained, compared to a baseline model that always guesses the average value of the pooled response variable.

If you’re higher than $R^2=0$, which you probably will be with in-sample data when you use an intercept, then you’re beating the baseline performance.