Regression – Principal Components Regression: Understanding Coefficients in Terms of Original Regressors

dimensionality reductionpcaregressionsvd

For better comparability between Principal Component Regression (PCR) and methods that do not shrink the dimension of the predictors, it would be nice to obtain the coefficients of the original regressors that are implied by PCR. I saw the claim that this mapping can be achieved before (e.g. on page 2 of this paper), but never spelled out in detail. I would like to know whether such a mapping can be made in a sensible way and if yes, how.

To be more precise, take a full-rank $N \times p$ design matrix $\mathbf{X}$, $p < N$ and a response vector $\mathbf{y}$ of length $N$. Make the following Singular Value Decomposition (SVD) of $\mathbf{X}$:
$$
\mathbf{X = U D V'}
$$
where $\mathbf{U}$ is $N\times p$ and orthogonal, $\mathbf{D}$ is $p\times p$ and diagonal and $\mathbf{V}$
is $p\times p$ and orthogonal. The principal components (PC) of $\mathbf{X}$ are given by $\mathbf{Z = X V}$. Now look at the PCR:

  • The coefficient estimate of the regression on all PC's is:
    $$
    \mathbf{\beta_{Z} = (Z'Z)^{-1}Z'y}
    $$
    Since $\mathbf{Z = X V}$ and $\mathbf{V}$ is orthogonal, we have that $\mathbf{X = Z V'}$, so if instead of regressing $\mathbf{y}$ on $\mathbf{Z}$ we regressed $\mathbf{y}$ on $\mathbf{Z V'}$ we would just get back the OLS coefficient:
    $$
    \mathbf{\beta_{X} = (X'X)^{-1}X'y}
    $$
  • The point of PCR is of course dimensionality reduction, so we typically choose $r \ll p$ principal components. Call them $\mathbf{Z_r = X V_r}$ where $\mathbf{V_r}$ denotes the first $r$ columns of $\mathbf{V}$. Analogous to the full rank case, we can define the $N\times p$ matrix $\mathbf{X_r}$ of rank $r$:
    $$
    \mathbf{X_r = Z_r V_r'}
    $$
    And let
    $$
    \mathbf{\beta_{X_r} = (X_r'X_r)^{+}X_r'y}
    $$
    Where $\mathbf{(X_r'X_r)^{+}}$ is the pseudoinverse of $\mathbf{X_r'X_r}$ which can be computed by transposing the SVD of $\mathbf{X_r'X_r}$ and inverting its nonzero singular values. This
    $\mathbf{\beta_{X_r}}$ should correspond to the regression on the first $r$ PC's
    $$
    \mathbf{\beta_{Z_r} = (Z_r'Z_r)^{-1}Z_r'y}
    $$

Can this $\mathbf{\beta_{X_r}}$ be interpreted as the coefficients of $\mathbf{X}$ that are implied by the PCR? If not, why not? If yes, is there more to the interpreation than the algebraic formulas? What would be other ways of mapping the PCR regression to the original regressors?

This question can be seen as a follow-up on this question and that question. I also add a small R example with artificial data to illustrate:

# Generate some data:
N <- 20
p <- 10
beta <- runif(p)
X <- scale(matrix(rnorm(N * p), ncol = p))
y <- X %*% beta
# SVD:
SVD <- svd(X)
U <- SVD$u; D <- diag(SVD$d); V <- SVD$v
# get the first r principal components:
r <- 2
Z.r <- X %*% V[, 1 : r]
# the PCA regression coefficient:
beta.Z.r <- solve(crossprod(Z.r)) %*% crossprod(Z.r, y)
# reverse the Z.r to the original space:
X.r <- tcrossprod(Z.r, V[, 1 : r])
require(MASS)
beta.X.r <- ginv(crossprod(X.r)) %*% crossprod(X.r, y)
# Check that the fitted values from beta.Z.r and beta.X.r are the same:
cbind(Z.r %*% beta.Z.r, X.r %*% beta.X.r)

Best Answer

As the comment by @whuber points out, the answer to the lengthy question is simply left-multiplying the $\mathbf{\beta_{Z_r}}$ by $\mathbf{V_r}$ to obtain $\mathbf{\beta_{X_r}}$. This gives the same result as the method described in the question.

Related Question