Solved – How to specify a contrast matrix (in R) for the difference between one level and an average of the others

contrastsr

I have a regression model that looks like this: $$Y = \beta_0+\beta_1X_1 + \beta_2X_2 + \beta_3X_3 +\beta_{12}X_1X_2+\beta_{13}X_1X_3+\beta_{123}X_1X_2X_3$$

…or in R notation: y ~ x1 + x2 + x3 + x1:x2 + x1:x3 + x1:x2:x3

Let's say $X_1$ and $X_2$ are categorical variables and $X_3$ is numeric. The complication is that $X_1$ has three levels $X_{1a}, X_{1b}, X_{1c}$ and instead of standard contrasts, I need to test:

Whether the intercept for level $X_{1a}$ significantly differs from the average intercept for levels $X_{1b}$ and $X_{1c}$.
Whether the response of $X_2$ is significantly different between level $X_{1a}$ and the average of levels $X_{1b}$ and $X_{1c}$.
Whether the slope of $X_3$ is significantly different between level $X_{1a}$ and the average of levels $X_{1b}$ and $X_{1c}$.

Based on this post it seems like the matrix I want is…

 2
-1
-1

So I do contrasts(mydata$x1)<-t(ginv(cbind(2,-1,-1))). The estimate of $\beta_1$ changes, but so do the others. I can reproduce the new estimate of $beta_1$ by subtracting the predicted values of the $X_1b$ and $X_1c$ group means (when $X_3=0$ and $X_2$ is at its reference level) from twice the value of $X_1a$ at those levels. But I can't trust that I specified my contrast matrix correctly unless I can also similarly derive the other coefficients.

Does anybody have any advice for how to wrap my head around the relationship between cell means and contrasts? Thanks. Is there a standard name for this type of contrast?

Aha! According to the link posted in Glen_b's answer, the bottom line is, you can convert ANY comparison of group means you want into an R-style contrast attribute as follows:

Make a square matrix. The rows represent the levels of your factor and the columns represent contrasts. Except the first one, which tells the model what the intercept should represent.
If you want your intercept to be the grand mean, fill the first column with all of the same non-zero value, doesn't matter what. If you want the intercept to be one of the level means, put a number in that row and fill the rest with zeros. If you want the intercept to be a mean of several levels, put numbers in those rows and zeros in the rest. If you want it to be a weighted mean, use different numbers, otherwise use the same number. You can even put in negative values in the intercept column and that probably means something too, but it completely changes the other contrasts, so I have no idea what that's for
Fill in the rest of the columns with positive and negative values indicating what levels you want compared to what others. I forget why summing to zero is important, but adjust the values so that the columns do sum to zero.
Transpose the matrix using the t() function.
Use ginv() from the MASS package or solve() to get the inverse of the transposed matrix.
Drop the first column, e.g. mycontrast<-mycontrast[,-1]. You now have a p x p-1 matrix, but the information you put in for your intercept was encoded in the matrix as a whole during step 5.
If you want labels in the summary output more pleasant to read than lm() et al.'s default output, name your matrix's columns accordingly. The intercept will always automatically get named (Intercept) however.
Make your matrix the new contrast for the factor in question, e.g. contrasts(mydata$myfactor)<-mymatrix
Run lm() (and probably many other functions that use formulas) as normal in standard R without having to load glht, doBy, or contrasts.

Glen_b, thank you, and thank you UCLA Statistical Consulting Group. My applied stats prof spent several days handwaving on this topic, and I was still left with no idea how to actually write my own contrast matrix. And now, an hour of reading and playing with R, and I finally think I get it. Guess I should have applied to UCLA instead. Or University of StackExchange.

Best Answer

That comparison of one with the mean of all later variables is (aside from scale), called Helmert coding or Helmert contrasts. The one you give is the first contrast, the other would be a scaled version of $(0, 1, -1)^\top$.

What R calls helmert coding, this calls 'reverse Helmert'. They're equivalent up to a change of variable order.

Related Solutions

Solved – Contrast to test significant interaction – why not include main effect

This is an interesting question, and actually quite subtle. It gets at the core of the definition of interactions, as well as the assumptions underlying dummy coding of categorical variables.

OP's question is that, when we are testing for interactions between the levels as given above, why do we (seemingly!) not bother to take into account interactions between one variable and the other variable's reference level?

The short answer is that we actually are taking these into account, as I'll explain here. First, to simplify subsequent notation, let $Y_{AE}$ denote the random variable $Y|\{X_1=A\} \cap \{X_2=E\}$, that is, $Y$ conditional on observing levels $A$ and $E$ in variables $X_1$ and $X_2$, respectively.

Now, recall that if we claim that there is no interaction between $X_1$ and $X_2$, this is equivalent to asserting that

$$ E[Y_{AE}]-E[Y_{BE}] = E[Y_{AF}]-E[Y_{BF}] = E[Y_{AG}]-E[Y_{BG}] \qquad (Eq. 1) $$

that is, the expected difference between $Y|X_1=A$ and $Y|X_1=B$ is unrelated to the level of $X_2$, as long as $X_2$ is held constant. Similarly, no interaction means that

$$ E[Y_{AE}]-E[Y_{AF}] = \cdots = E[Y_{DE}]-E[Y_{DF}]. $$

Now, OP correctly states that given our choice of reference levels $A$ and $E$, the definition of $\beta_1$ is the expected difference between $Y$ when $X_1=B$ versus $X_1=A$, given that $X_2$ is held constant at $E$. In other words,

$$ \beta_1 = E[Y_{AE}] - E[Y_{BE}], $$

but if there is no interaction between $X_1$ and $X_2$, it follows immediately from equation 1 above that $$ \beta_1 = E[Y_{AE}]-E[Y_{BE}] = E[Y_{AF}]-E[Y_{BF}] = E[Y_{AG}]-E[Y_{BG}] $$

In other words, if there is no interaction, then $\beta_1$ is sufficient to explain the differences in $Y$ based on levels of $X_1$ when $X_2$ is held constant. This is why the test for an interaction involves leaving $\beta_1$ in the model and testing whether the interaction terms are zero.

The full answer to your question involves a simultaneous derivation for $\beta_1$ through $\beta_5$, but from this example you hopefully get the idea of what's going on.

Solved – How to compare nested factor levels to mean, not to first level, and how to test the last nested level in R lm()

The answer in short is "contrasts". I had heard of them before but hadn't learned what they are. In fact, they serve exactly the purpose I was looking for. They can be used to alter the coefficients. For a given model you would always have the same number of coefficients, but by using contrasts they can be changed to be different linear combinations of the default coefficients. That is, the model is reparameterized - same model, different coefficient meanings and values.

Also, additional comparisons of other "virtual parameters" (linear combinations of actual parameters), not compared in the model summary, can be made after the model is fit, using the multcomp package, although this is not necessarily simple. (The "contrasts" package may also do this, though I couldn't figure out how to use it).

The default contrast in R is "contr.treatment" and it leads to the coefficients in the problem statement, with levels being compared to the base level within that factor. Another contrast, "contr.sum", generates coefficients that compare each level to the mean of all the levels of that factor.

In my case, I want the outer factor to continue using the "contr.treatment" contrast because I do want the outer factor to compare to the control level, jA. For the inner factor, though, I want to use "contr.sum" so the k factor levels get compared to the mean of all k levels at that j level. This is done after creating the k factor, with the assignment statement: contrasts(k) = "contr.sum"

The full code now looks like this:

> j = gl(3, 30, labels=LETTERS[1:3])
> k = gl(3, 10, length=90, labels=LETTERS[13:15])
> k[1:30] = "M" # Reference j="A" has K="M" only
> contrasts(k, 3) = "contr.sum"
> x = 100*(2^as.numeric(k)) # 200,400,800 for k=M,N,O
> x[1:30] = mean(c(200,400,800)) # Reference j="A" has x=same mean as "B" and "C"
> x = x + rnorm(90, 0, 4) # A little randomness
> x = x + rep(c(0, 1, 1.2), each=30) # A slight non-significant deviation by j.
> df = data.frame(x=x, j=j, k=k)
> f = lm(x ~ j/k, data=df, drop.unused.levels=TRUE)
> summary(f)

Call:
lm(formula = x ~ j/k, data = df, drop.unused.levels = TRUE)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.6628  -2.9090  -0.4446   2.9412  10.0984 

Coefficients: (2 not defined because of singularities)
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  467.3444     0.7878  593.259   <2e-16 ***
jB             0.6631     1.1141    0.595    0.553    
jC             0.4530     1.1141    0.407    0.685    
jA:k1              NA         NA       NA       NA    
jB:k1       -266.9330     1.1141 -239.604   <2e-16 ***
jC:k1       -267.5542     1.1141 -240.162   <2e-16 ***
jA:k2              NA         NA       NA       NA    
jB:k2        -66.0202     1.1141  -59.261   <2e-16 ***
jC:k2        -65.7784     1.1141  -59.044   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.315 on 83 degrees of freedom
Multiple R-squared:  0.9996,    Adjusted R-squared:  0.9996 
F-statistic: 3.344e+04 on 6 and 83 DF,  p-value: < 2.2e-16

The coefficients have been renamed with "k1" and "k2" instead of "kM" and "kN" for some reason, but the results are now what is expected, as verified here:

> mean(df$x[df$j == "A"])
[1] 467.3444
> mean(df$x[df$j == "B"])-mean(df$x[df$j == "A"])
[1] 0.6631321
> mean(df$x[df$j == "C"])-mean(df$x[df$j == "A"])
[1] 0.4530243
> mean(df$x[df$j == "B" & df$k == "M"])-mean(df$x[df$j == "B"])
    [1] -266.933
    > mean(df$x[df$j == "C" & df$k == "M"])-mean(df$x[df$j == "C"])
[1] -267.5542
> mean(df$x[df$j == "B" & df$k == "N"])-mean(df$x[df$j == "B"])
    [1] -66.02023
    > mean(df$x[df$j == "C" & df$k == "N"])-mean(df$x[df$j == "C"])
[1] -65.77839

The comparisons of jB:kO and jC:kO (aka jB:k3 and jC:k3) are still missing. These do not appear as coefficients because they are simple linear combinations of the existing coefficients. In fact, with the "contr.sum" contrast, the value these coefficients would have is jB:k3 = -(jB:k1+jB:k2) and jC:k3 = -(jC:k1+jC:k2), as can be verified with a little algebra, remembering that coefficient jB is the mean of level jB minus the intercept, and jB:k1 is the mean of level kM minus the mean of level jB.

To use that knowledge to do the other two tests that I want, the glht() function in the multcomp package can be used. However, it chokes on the NA coefficient values that resulted from level jA containing only jB:M, requiring that we modify the model matrix to remove the columns corresponding to the NA coefficients, as well as the intercept column:

> # With contr.sum contrasts, the jB:k2 term is the negative of jB:k1+jB:k2
> summary(glht(f, linfct="0 - jB:k1 - jB:k2 = 0"))
Error in summary(glht(f, linfct = "0 - jB:k1 - jB:k2 = 0")) : 
  error in evaluating the argument 'object' in selecting a method for function 'summary': Error in modelparm.default(model, ...) : 
  dimensions of coefficients and covariance matrix don't match
>
> # It fails because of NA coefficients.  They can be removed by removing their
> # columns in the model matrix and then updating the model.
> M = model.matrix(f)
> # Save the "assign" and "contrasts" attributes of the model matrix.
> attr.assign = attr(M, "assign")
> attr.contrasts = attr(M, "contrasts")
> rmvCols = c(1, which(summary(f)$aliased))
> M = M[,-rmvCols] # Remove intercept column (required) and NA columns
> # Reassign the "assign" attribute, removing the corresponding elements from it.
> attr(M, "assign") = attr.assign[-rmvCols]
> # Reassign the "contrasts" attribute to its original value
> attr(M, "contrasts") = attr.contrasts

Now we update model with new model matrix and try glht() again.

> f1 = update(f, ~M)
> f1

Call:
lm(formula = x ~ M, data = df, drop.unused.levels = TRUE)

Coefficients:
(Intercept)          MjB          MjC       MjB:k1       MjC:k1       MjB:k2       MjC:k2  
   466.9725       0.2974      -0.4174    -266.9183    -267.7272     -65.2277     -66.5216  

> # Now try glht using updated model fit.  Must prepend an "M" on the coefficient
> # names, which was added because model matrix was named M.
> summary(glht(f1, linfct="0 - MjB:k1 - MjB:k2 = 0"))

     Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = x ~ M, data = df, drop.unused.levels = TRUE)

Linear Hypotheses:
                         Estimate Std. Error t value Pr(>|t|)    
0 - MjB:k1 - MjB:k2 == 0  332.953      1.114   298.9   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

Excellent! This is one of the two missing comparisons I wanted!!!

To confirm estimate correctness:

> coef(f)
 (Intercept)           jB           jC        jA:k1        jB:k1        jC:k1        jA:k2        jB:k2        jC:k2 
 467.3444162    0.6631321    0.4530243           NA -266.9329982 -267.5542422           NA  -66.0202315  -65.7783877 
> -(coef(f)[5] + coef(f)[8])
   jB:k1 
332.9532

Yes, it is correct!

Do both missing comparisons, jB:k3 and jC:k3, at the same time.

> summary(glht(f1, linfct=c("0 - MjB:k1 - MjB:k2 = 0", "0 - MjC:k1 - MjC:k2 = 0")))

     Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = x ~ M, data = df, drop.unused.levels = TRUE)

Linear Hypotheses:
                         Estimate Std. Error t value Pr(>|t|)    
0 - MjB:k1 - MjB:k2 == 0  332.953      1.114   298.9   <1e-10 ***
0 - MjC:k1 - MjC:k2 == 0  333.333      1.114   299.2   <1e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

Sweet!

Best Answer

Related Solutions

Solved – Contrast to test significant interaction – why not include main effect

Solved – How to compare nested factor levels to mean, not to first level, and how to test the last nested level in R lm()

Related Question