I would like to run a multiple linear regression to test if my protein of interest differs among different stages of a disease, while controlling for covariates/ confounders such as age and gender. The disease stage variable has four levels. I would like to compare the "control" stage with the three remaining stages. From what I understand, once I specify the disease stage variable as categorical and specify the reference group ("control"), the multiple regression output will compare the mean of the "control" group with each disease stage. I believe this is similar to running an ANCOVA. However, in ANCOVA analyses, a post-hoc test with an adjustment for multiple comparisons is performed (such as Bonferroni). I am aware that some statistical softwares (such as SPSS) perform post-hoc comparisons on unadjusted group means.

I have noticed that my predictor "disease stage" is significant in my multiple regression analysis. Therefore, I am wondering if it is possible to run a regression each time I would like to separately compare the mean of the control group to another, while controlling for the covariates. Finally, I would apply the Benjamini-Hochberg (FDR) procedure to the p values for the group differences. I understand some softwares (such as JMP) provide the regression output with all the comparisons with the reference group. However, I would think that it would be more appropriate to exclude certain pairwise comparisons from the regression. Such that we only focus on one comparison in the "post-hoc" regression model? I hope my last point makes sense.

Thank you

## Best Answer

You are much better off doing a single regression with your multi-category

`disease stage`

predictor and the covariates. With the "control" group as the reference for that predictor, you will get point estimates and standard errors/confidence intervals for the difference of each of the 3 other stages versus "control" in a single analysis that uses all of your data together. Such results are typically reported directly in a table of regression coefficients.One could also report a multiple-comparison correction. Note that the Bonferroni correction is unnecessarily conservative; see the discussion in the help page for the

`p.adjust()`

function in the basic R`stats`

package, which provides a simple way to apply any of several methods for p-value correction. If you have a specific set of pre-specified comparisons, there is no need to perform all pairwise tests and the correction need only be applied to the pre-specified comparisons.Breaking this apart into separate models for each

`disease stage`

category versus "control" throws away potentially informative data and loses power. If you think that the associations of some covariates with outcome differs depending on`disease stage`

then it's generally better to include interaction terms for those covariates with`disease stage`

rather than do separate models. That takes advantage of all your data at once, giving you maximum power to detect true effects.