This is a very general question about R-squared or the coefficient of determination. I found a couple of threads on CV but none that answers my question in a straightforward way.
In short, what is a ‘low’ R-squared when running multiple linear regression? From which minimum value should we conclude that our model does not make better than the baseline?
I sometimes see R-squared values that are as low as 0.15, yet the models are significant. I guess this depends on size, on whether R-squared is used for prediction or inference, etc., however I still do not have a good intuition for it.
It also seems to me that in the ‘hard’ sciences, R-squared tend to be high (say, 0.8 or higher in classic cases), whereas in the social sciences, from what I can see, it tends to be lower (say, under 0.5). I know this might be a gross generalization, however.
Any thoughts much appreciated.
Best Answer
Consider what $R^2$ means: proportion of variability explained, compared to a baseline model that always guesses the average value of the pooled response variable.
If you’re higher than $R^2=0$, which you probably will be with in-sample data when you use an intercept, then you’re beating the baseline performance.