I would like to compare the affect of parameters x and z on dependent variable y.
I'm not sure how to know whether z or x is 'better'/'stronger'/'more likely to be a driver' of y.
For x, when I plotted the data I noticed a quadratic relationship lm(y~ x^2)
I wrote the polynomial regression code to call from my data frame dat_CV
like this:
lm(dat_CV[[y]] ~ dat_CV[[x]] + I(dat_CV[[x]]^2) , data= dat_CV)
My output for model using x is
Residuals:
Min 1Q Median 3Q Max
-0.1671 -0.0685 0.0227 0.0665 0.1144
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.143040 0.230929 4.95 0.0017 **
dat_CV[[x]] 0.093053 0.022701 4.10 0.0046 **
I(dat_CV[[x]]^2) -0.001987 0.000477 -4.16 0.0042 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.101 on 7 degrees of freedom
Multiple R-squared: 0.713, Adjusted R-squared: 0.63
F-statistic: 8.68 on 2 and 7 DF, p-value: 0.0127
The relationship for y~z was linear
lm(dat_CV[[y]] ~ dat_CV[[z]], data = dat_CV)
Residuals:
Min 1Q Median 3Q Max
-0.0946 -0.0638 -0.0369 0.0943 0.1073
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0307 0.4418 -0.07 0.9463
dat_CV[[z]] 3.1370 0.6682 4.69 0.0016 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0911 on 8 degrees of freedom
Multiple R-squared: 0.734, Adjusted R-squared: 0.7
F-statistic: 22 on 1 and 8 DF, p-value: 0.00155
Regarding the quadratic results of model x
i) Im not sure how to interpret the p values. Can I phrase the result in a paper like this:
Parameter x was found to have a significant quadratic relationship with y (F=
8.68, 2,7, p=0.013)
Should I be reporting the p-value of I(dat_CV[[x]]^2)
or both rows instead or as well as the overall model?
ii) How do I interpret the fact that the p-values are significant at p<0.05 for each parameter but not for the overall model?
Comparing the two models
iii) Can I use the $R^2$ to compare the linear and quadratic models? If not so can I compare the Residual standard errors to say which model is a better goodness-of-fit?
i.e
y~x^2 resid.s.e. = 0.101
y~z resid.s.e = 0.091
Therefore y~z is a 'slightly' better fit?
(I know here the s.e. are almost the same but in other comparisons the difference between models was much bigger so I want to understand the meaning)
Does this mean that z is a 'better' predictor of y, even though both had significant p-values?
iv) Since the estimate for a quadratic is no longer the slope like in a linear regression, how can I evaluate the 'size'/'strength' of the correlation to compare between models?
Best Answer
As was pointed out in the comments you need to include all of your variables in the model to understand importance. A simple and effective way to understand a variable's importance with respect to the ability of your model to make good predictions is to use the Mean Decrease in Accuracy (which can be used to understand the effect of a variable on any score, like MSE). Make sure you apply this technique to data that was not used (hold out data) to build the model.