Solved – Reporting linear regression with Post-hoc comparisons

anovapost-hocregression

I have a quantitative variable (Dim: which is actually factor scores) and 2 categorical variables (Category + Region) in my dataset. I am interested in knowing if the variance explained by each categorical variable (and their interaction) is significant. Additionally I also want to know how much overall variance is explained by the model (i.e. the R squared value). After consulting materials about this topic I came to know that following 2 commands in R can fulfill my requirements.

my_lm <- lm(Dim4 ~ Category*Region, data=df)
summary(my_lm)
Anova(my_lm, type = "III")

I'm also interested in reporting whether group comparisons within each category are significantly different from each other. For this purpose, TukeyHSD has been mentioned in the reference material.

TukeyHSD(aov(my_lm), ordered=TRUE)

This command outputs the whole range of group comparisons for example:

$Category
                            diff          lwr      upr     p adj
Tweets-Conversations   0.6843803 -0.279052730 1.647813 0.2961601
Comments-Conversations 1.6552203  0.665253423 2.645187 0.0000540
$Region
          diff       lwr      upr p adj
US-PK 1.146865 0.7036846 1.590046 5e-07

$`Category:Region`
                                        diff         lwr      upr     p adj
Conversations:US-Tweets:PK        0.04138269 -1.46974284 1.552508 1.0000000
Conversations:PK-Tweets:PK        0.30818926 -1.31628692 1.932665 0.9998612
Comments:PK-Tweets:PK             1.20670619 -0.35032424 2.763737 0.2931398
Tweets:US-Tweets:PK               1.78097509  0.24027735 3.321673 0.0097448

While I am interested in all comparisons of Category and Region individually, in the interaction subgroups I am only interested in certain comparisons, e.g. Tweets:US-Tweets-PK, Conversations:US-Conversations-PK.
I have 2 questions regarding reporting the results of these tests. Firstly,How should I report the results of linear regression model (e.g. in case of Anova there would be F-statistic, p value and degrees of freedom like this One-way ANOVA: F(9, 3085) = 70.23, p < .001;). But I am looking at my data in terms of linear regression and want to report r squared + significance of each variable and their interaction as well. Does that involve reporting degrees of freedom and F statistic?
Secondly, as mentioned above I am only interested in certain group comparisons when it comes to interaction effects, can I just report the selected group comparisons? Another relevant question is if I'm not interested in all comparisons, can I limit the output of post hoc test to my desired group comparisons only?

Best Answer

You could take a look at the emmeans (estimated marginal means) package in R. It provides a lot of flexibility for these kinds of purposes. Since it appears you do have an interaction of consequence, I recommend against doing main-effects comparisons of either factor, averaged over the other. Instead, get a good display of how the interaction plays out, e.g. an interaction plot:

require("emmeans")
emmip(my_lm, Category ~ Region)

and display the EMMs and comparisons of one factor's levels separately by the other:

emm = emmeans(my_lm, ~ Category * Region)
emm
pairs(emm, by = "Category")
pairs(emm, by = "Region")

If you do not want all pairwise comparisons even with a by variable, there are several other built-in contrast types, or you can specify your own. For example, to compare consecutive levels of Category for each Region, do

contrast(emm, "consec", by = "Region")

There are a lot of examples and discussion in the several vignettes in the package. Start with vignette("basics")