Two ways to get rid of multicollinearity

I have a couple of questions concerning multicollinearity in a linear regression model $Y=X \beta + \epsilon$.

If the design matrix presents some multicollinearity i.e. $\det(X^TX) \approx 0$, we can fix it by replacing it by a design matrix $Z$ of column that form an orthogonal basis of $Im(X)$. To do that, it is said to consider $X^TX$ and its diagonalised form : $X^T X=U D U^T$ and to take $Z=XU$. However, I don't see how does that solve multicollinearity because $(XU)^T XU=D$ so that $\det(X^TX)=\det(Z^TZ)=\det((XU)^TXU)\approx 0$. So we did not solve multicollinearity ?

Furthermore, I understand that this manipulation make us lose interpretation, instead of having a column for age, height and weight for example, we'll have a matrix with columns like 0.4 age + 0.1 height + 0.5 weight which does not allow a nice interpretation. One way to solve this problem without losing interpretation is to use ridge regression. To do so, we start by standardising the design matrix, writing first as $X=(\textbf1 \quad W)$, $\beta=(\beta_0 \quad \gamma)$ (we separate the intercept (the column with 1 in each entry) from the other columns). It is said in my course "we then rescale the covariates by defining $Z_j=\frac {\sqrt n}{sd(W_j)} (W_j – \textbf1 \overline{W_j})$ so that coefficients now have common scale". As an explanation, our teacher said that in the initial design matrix $X$, if we change the units of measurement of one of the covariates (from meters to miles for example), then the magnitude of the corresponding coefficient in $\beta$ will change too (I agree) and he said we won't have this problem when using the $Z_j$. I think this example does not illustrate well what he meant by "coefficients now have common scale". Can someone come up with a better explanation ?

Best Answer

1, Regarding first paragraph. The goal is to detect multicollinearity when it occurs, and then prune out redundant design vectors and lower the effective dimensionality of the model.

The singular value decomposition of the symmetric matrix $G= X^t X$ has the factored form $G=U DU^t$ in which $D$ will be diagonal and $U$ is orthogonal matrix ( i.e. $U^t=U^{-1}$). The equation $G=UDU^t$ just means that $GU= UD$ and that means each column vector of $U$ is an eigenvector of $G$: $GU_k= d_k U_k$. In the event of multicollinearity, $D$ will have some entries equal to zero (typically the lower-rightmost in most output layouts). Those correspond to the columns of $U$ that are null vectors of $G$; i.e. $GU=0$ Toss them out to get a smaller diagonal matrix $D'$. You can also prune out the corresponding columns from $U$. Call the smaller matrix $U'$. Then you can check that it is still true that $U' D(U')^t =G$ and you can set $Z= XU'$. Suggest you try this with a numerical example and see what happens.

Regarding the rescaling. The Z vectors are created by first mean-centering each design vector $W$. Then the mean-centered version of $W$ is rescaled so that it has total length 1 (or in your textbook's formula perhaps length $\sqrt{n}$ ). This conveniently ensures that the entries of $G$ will now be numbers between $\pm 1$.

However it will do nothing to solve the issue of multicollinearity if $\vec 1$ is already one of the design vectors (as is commonly the case). On the other hand, if $\vec 1$ is not in the linear span of your given design vectors, then adding on this perturbation usually breaks the degeneracy in the design span.

Best Answer

Related Solutions

[Math] Multicollinearity: Why does highly correlated columns in the design matrix lead to high variance of the regression coefficient

Sparse group lasso derivation of soft thresholding operator via subgradient equations

Related Question