Solved – compare two linear models. Linear regression

linearlinear modelmultiple regressionregression

I have made two linear regressions to estimate y and I get this results:
One:

Residual standard error: 1.021 on 276 degrees of freedom
Multiple R-squared:  0.2347,    Adjusted R-squared:  0.2059 
F-statistic: 8.362 on 10 and 276 DF,  p-value: 6.878e-12

Second:

Residual standard error: 1.025 on 273 degrees of freedom
Multiple R-squared:  0.2312,    Adjusted R-squared:  0.1945 
F-statistic: 6.314 on 13 and 273 DF,  p-value: 2.085e-10

I know from $R^2$ that these models are not good, but which one is better from the other one? Can someone can explain the other factors beside R-squared? Maybe use ANOVA to compare?

Best Answer

As you correctly notice your $R^2$ associated with each model is very similar ($0.235$ vs $0.231$). In general none of the two though is "not good". $R^2$ being bad or good is entirely up to the actual application. As mentioned in my comment, unless you have good reasons to believe that your explanatory variables have strong linear relations with your dependent variable and you do not omit any other variables that have strong influence, an $R^2 \approx 0.23$ is far from catastrophic.

The obvious thing to suggest is to look at some kind of information criteria (Akaike Information Criterion, or Bayesian Information Criterion) to see if any of the two models is obviously better. Let me point out that these are not silver bullets. They make their own assumptions that have to be met. (eg. for the BIC you need your models to be nested.)

The factor "next to $R^2$" is the adjusted $R^2$: This is essentially the coefficient of determination but penalized so it accounts for the fact that you have a given number of explanatory variables in your model. Short of cross-validating or bootstrapping your model I would suggest using the AIC with a correction for finite sample sizes, dubbed AICc. You can use it by employing the AICc function available in package AICcmodavg to do that. Cross-validated has some excellent thread on the perils of automatic model selection (eg. here and here); I highly recommend reading them. To paraphrase the late George E. P.Box: Your model is certainly wrong, you just want to see if it is any useful. :)