Solved – Interpreting and comparing linear and quadratic regression

multiple regressionr-squaredregression

I would like to compare the affect of parameters x and z on dependent variable y.
I'm not sure how to know whether z or x is 'better'/'stronger'/'more likely to be a driver' of y.

For x, when I plotted the data I noticed a quadratic relationship lm(y~ x^2)

I wrote the polynomial regression code to call from my data frame dat_CVlike this:

lm(dat_CV[[y]] ~ dat_CV[[x]] + I(dat_CV[[x]]^2) , data= dat_CV)

My output for model using x is

Residuals:
    Min      1Q  Median      3Q     Max 
-0.1671 -0.0685  0.0227  0.0665  0.1144 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)   
(Intercept)       1.143040   0.230929    4.95   0.0017 **
dat_CV[[x]]       0.093053   0.022701    4.10   0.0046 **
I(dat_CV[[x]]^2) -0.001987   0.000477   -4.16   0.0042 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.101 on 7 degrees of freedom
Multiple R-squared:  0.713, Adjusted R-squared:  0.63 
F-statistic: 8.68 on 2 and 7 DF,  p-value: 0.0127

The relationship for y~z was linear

lm(dat_CV[[y]] ~ dat_CV[[z]], data = dat_CV)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.0946 -0.0638 -0.0369  0.0943  0.1073 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.0307     0.4418   -0.07   0.9463   
dat_CV[[z]]   3.1370     0.6682    4.69   0.0016 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0911 on 8 degrees of freedom
Multiple R-squared:  0.734, Adjusted R-squared:   0.7 
F-statistic:   22 on 1 and 8 DF,  p-value: 0.00155

Regarding the quadratic results of model x

i) Im not sure how to interpret the p values. Can I phrase the result in a paper like this:

Parameter x was found to have a significant quadratic relationship with y (F=
8.68, 2,7, p=0.013)

Should I be reporting the p-value of I(dat_CV[[x]]^2) or both rows instead or as well as the overall model?

ii) How do I interpret the fact that the p-values are significant at p<0.05 for each parameter but not for the overall model?

Comparing the two models

iii) Can I use the $R^2$ to compare the linear and quadratic models? If not so can I compare the Residual standard errors to say which model is a better goodness-of-fit?

i.e
y~x^2 resid.s.e. = 0.101

y~z resid.s.e = 0.091

Therefore y~z is a 'slightly' better fit?
(I know here the s.e. are almost the same but in other comparisons the difference between models was much bigger so I want to understand the meaning)

Does this mean that z is a 'better' predictor of y, even though both had significant p-values?

iv) Since the estimate for a quadratic is no longer the slope like in a linear regression, how can I evaluate the 'size'/'strength' of the correlation to compare between models?

Best Answer

As was pointed out in the comments you need to include all of your variables in the model to understand importance. A simple and effective way to understand a variable's importance with respect to the ability of your model to make good predictions is to use the Mean Decrease in Accuracy (which can be used to understand the effect of a variable on any score, like MSE). Make sure you apply this technique to data that was not used (hold out data) to build the model.