I cannot understand the usage of polynomial contrasts in regression fitting. In particular, I am referring to an encoding used by R
in order to express an interval variable (ordinal variable with equally spaced levels), described at this page.
In the example of that page, if I understood correctly, R fits a model for an interval variable, returning some coefficients which weights its linear, quadratic, or cubic trend. Hence, the fitted model should be:
$${\rm write} = 52.7870 + 14.2587X – 0.9680X^2 – 0.1554X^3,$$
where $X$ should take values $1$, $2$, $3$, or $4$ according to the different level of the interval variable.
Is this correct? And, if so, what was the purpose of polynomial contrasts?
Best Answer
Just to recap (and in case the OP hyperlinks fail in the future), we are looking at a dataset
hsb2
as such:which can be imported here.
We turn the variable
read
into an ordered / ordinal variable:Now we are all set to just run a regular ANOVA - yes, it is R, and we basically have a continuous dependent variable,
write
, and an explanatory variable with multiple levels,readcat
. In R we can uselm(write ~ readcat, hsb2)
1. Generating the contrast matrix:
There are four different levels to the ordered variable
readcat
, so we'll have $n-1=3$ contrasts.First, let's go for the money, and take a look at the built-in R function:
Now let's dissect what went on under the hood:
$y = \small [-1.5, -0.5, 0.5, 1.5]$
$\small \text{seq_len(n) - 1} = [0, 1, 2, 3]$
$\small\begin{bmatrix} 1&-1.5&2.25&-3.375\\1&-0.5&0.25&-0.125\\1&0.5&0.25&0.125\\1&1.5&2.25&3.375 \end{bmatrix}$
What happened there? the
outer(a, b, "^")
raises the elements ofa
to the elements ofb
, so that the first column results from the operations, $\small(-1.5)^0$, $\small(-0.5)^0$, $\small 0.5^0$ and $\small 1.5^0$; the second column from $\small(-1.5)^1$, $\small(-0.5)^1$, $\small0.5^1$ and $\small1.5^1$; the third from $\small(-1.5)^2=2.25$, $\small(-0.5)^2 = 0.25$, $\small0.5^2 = 0.25$ and $\small1.5^2 = 2.25$; and the fourth, $\small(-1.5)^3=-3.375$, $\small(-0.5)^3=-0.125$, $\small0.5^3=0.125$ and $\small1.5^3=3.375$.Next we do a $QR$ orthonormal decomposition of this matrix and take the compact representation of Q (
c_Q = qr(X)$qr
). Some of the inner workings of the functions used in QR factorization in R used in this post are further explained here.$\small\begin{bmatrix} -2&0&-2.5&0\\0.5&-2.236&0&-4.584\\0.5&0.447&2&0\\0.5&0.894&-0.9296&-1.342 \end{bmatrix}$
... of which we save the diagonal only (
z = c_Q * (row(c_Q) == col(c_Q))
). What lies in the diagonal: Just the "bottom" entries of the $\bf R$ part of the $QR$ decomposition. Just? well, no... It turns out that the diagonal of a upper triangular matrix contains the eigenvalues of the matrix!Next we call the following function:
raw = qr.qy(qr(X), z)
, the result of which can be replicated "manually" by two operations: 1. Turning the compact form of $Q$, i.e.qr(X)$qr
, into $Q$, a transformation that can be achieved withQ = qr.Q(qr(X))
, and 2. Carrying out the matrix multiplication $Qz$, as inQ %*% z
.Crucially, multiplying $\bf Q$ by the eigenvalues of $\bf R$ does not change the orthogonality of the constituent column vectors, but given that the absolute value of the eigenvalues appears in decreasing order from top left to bottom right, the multiplication of $Qz$ will tend to decrease the values in the higher order polynomial columns:
Compare the values in the later column vectors (quadratic and cubic) before and after the $QR$ factorization operations, and to the unaffected first two columns.
Finally we call
(Z <- sweep(raw, 2L, apply(raw, 2L, function(x) sqrt(sum(x^2))), "/", check.margin = FALSE))
turning the matrixraw
into an orthonormal vectors:This function simply "normalizes" the matrix by dividing (
"/"
) columnwise each element by the $\small\sqrt{\sum_\text{col.} x_i^2}$. So it can be decomposed in two steps: $(\text{i})$apply(raw, 2, function(x)sqrt(sum(x^2)))
, resulting in2 2.236 2 1.341
, which are the denominators for each column in $(\text{ii})$ where every element in a column is divided by the corresponding value of $(\text{i})$.At this point the column vectors form an orthonormal basis of $\mathbb{R}^4$, until we get rid of the first column, which will be the intercept, and we have reproduced the result of
contr.poly(4)
:$\small\begin{bmatrix} -0.6708204&0.5&-0.2236068\\-0.2236068&-0.5&0.6708204\\0.2236068&-0.5&-0.6708204\\0.6708204&0.5&0.2236068 \end{bmatrix}$
The columns of this matrix are orthonormal, as can be shown by
(sum(Z[,3]^2))^(1/4) = 1
andz[,3]%*%z[,4] = 0
, for example (incidentally the same goes for rows). And, each column is the result of raising the initial $\text{scores - mean}$ to the $1$-st, $2$-nd and $3$-rd power, respectively - i.e. linear, quadratic and cubic.2. Which contrasts (columns) contribute significantly to explain the differences between levels in the explanatory variable?
We can just run the ANOVA and look at the summary...
summary(lm(write ~ readcat, hsb2))
... to see that there is a linear effect of
readcat
onwrite
, so that the original values (in the third chunk of code in the beginning of the post) can be reproduced as:... or...
... or much better...
Being orthogonal contrasts the sum of their components adds to zero $\displaystyle \sum_{i=1}^t a_i = 0$ for $a_1,\cdots,a_t$ constants, and the dot product of any two of them is zero. If we could visualized them they would look something like this:
The idea behind orthogonal contrast is that the inferences that we can exctract (in this case generating coefficients via a linear regression) will be the result of independent aspects of the data. This would not be the case if we simply used $X^0, X^1, \cdots. X^n$ as contrasts.
Graphically, this is much easier to understand. Compare the actual means by groups in large square black blocks to the prediced values, and see why a straight line approximation with minimal contribution of quadratic and cubic polynomials (with curves only approximated with loess) is optimal:
If, just for effect, the coefficients of the ANOVA had been as large for the linear contrast for the other approximations (quadratic and cubic), the nonsensical plot that follows would depict more clearly the polynomial plots of each "contribution":
The code is here.