Simple Linear Regression Reporting – What Information to Include

I have just performed some (very) simple linear regression in Genstat and would like to include a succinct and meaningful summary of the output in my report. I'm not sure exactly what or how much of the information I should be including.

The main bits of my Genstat output look like this:

Summary of analysis 
Source      d.f.    s.s.       m.s.       v.r.    F pr.
Regression    1   8128935.   8128935.    814.41   <.001
Residual     53    529015.      9981.        
Total        54   8657950.    160332.        

Percentage variance accounted for 93.8
Standard error of observations is estimated to be 99.9.

Estimates of parameters 
Parameter    estimate    s.e.     t(53)   t pr.
Constant      41.5      30.7       1.35   0.182
UKHR_Ref       0.8659    0.0303   28.54   <.001

I was intending to report this simply as:

Adjusted R2 = 0.94 (slope = 0.87, p < 0.001; intercept not significantly different from 0).

but a colleague has suggested that I should also include at least the root mean squared error (which I believe in this case is equal to the standard error of the observations i.e. 99.9?).

Does including the RMSE provide additional useful information, or is the goodness of fit already adequately explained by the adjusted-R2 value?

Are there hard-and-fast rules for how much information to report, or is it fairly subjective?

Thanks very much!

Best Answer

For a simple linear regression, I would always produce a plot of the x variable against the y variable, with the regression line super-imposed on the plot (always plot your data whenever its feasible!). This will tell you very easily how well your model fits, and is easy to read for 1 variable regression. Adding that to what you've already got would probably be sufficient, although you may want to include some diagnostic plots (leverage, cooks distance, residuals, etc.). It depends on how good that x-y plot is, and on your intended audience, and any protocols that your audience expect.

$R^2$ vs RMSE

$R^2$ is a relative measure, whereas the RMSE is more of an absolute measure, as you would expect most observations to be within $\pm$RMSE from the fitted line, and nearly all to be within $\pm 2$RMSE. If you want to convey "explanatory power" $R^2$ is probably better, and if you want to convey "predictive power", the RMSE is probably better.

Best Answer

Related Solutions

Solved – Reporting results of linear mixed-effects model

Solved – Reporting glmer.nb Results

Related Question