Solved – Is PCA a non-linear transform

dimensionality reductionlinearnonlinearpcapython

In the article Relative Information Loss in the PCA, the authors make, at some point (in the introductory section), the following statement:

In case the orthogonal matrix is not known a priori, but has to be estimated from a set of input data vectors collected in the matrix $\underline{X}$, the PCA becomes a nonlinear operation:

$$\underline{Y} = \underline{w}(\underline{X})\underline{X}$$

Here, $\underline{w}$ is a matrix-valued function which computes the orthogonal matrix required for rotating the data (e.g., using the QR algorithm).

This statement contrasts with most statements about PCA, which is regarded as a linear transformation.

I designed a toy experiment to check the linearity (additivity property): $f(a + b) = f(a) + f(b)$.

import numpy
from sklearn.decomposition import PCA

if __name__ == '__main__':
    numpy.random.seed(42)
    m = 100
    d = 3
    X = numpy.random.normal(size = (m, d))

    # Center data
    X -= numpy.mean(X, axis = 0)

    pca = PCA(n_components = d)
    pca.fit(X)
    Y = pca.transform(X)

    # Check linearity, pca(a + b) = pca(a) + pca(b)
    for i in range(0, m):
        for j in range(0, m):
            d = pca.transform([X[i] + X[j]]) - (Y[i] + Y[j])
            assert numpy.allclose(d, numpy.array([0.0, 0.0, 0.0]))

The expression $f(a + b) – (f(a) + f(b))$, where $f = \mathrm{PCA}$, seems to be the zero vector, thus I assume the transform (PCA) is linear.

What am I missing then, that PCA is considered non-linear when the orthogonal matrix (the matrix of the principal components) is estimated from $X$ (see the quote above)?

Best Answer

I think the confusion is due to what exactly is meant here to be linear or non-linear.

Using the notation of your quote, operation $w(X)$ maps a data matrix $X$ into a projector $P_k$ on the first $k$ principal axes of $X$. Let us be completely clear about the notation here; for simplicity let us fix $k=1$ and assume that $X$ is centered. Then $X\in\mathbb R^{n\times p}$ and $P\in \mathbb P^p \subset \mathbb R^{p\times p}$, where by $\mathbb P^p$ I mean the space of all matrices of the form $P=\mathbf{uu}^\top$ with $\mathbf u\in \mathbb R^p$ and $\|\mathbf u\|=1$.

Now:

  • Operation $w:\mathbb R^{n\times p} \to \mathbb P^p$ is non-linear.
  • Operation $P:\mathbb R^p \to \mathbb R$ is linear.

The quote talks about the $w(\cdot)$ function; it transforms a data matrix into a projection operator. It is non-linear. Your script investigates the $P(\cdot)$ function; it transforms a high-dimensional vector into a low-dimensional PCA projection, given a fixed dataset. It is linear.

So $w(\cdot)$ is a non-linear mapping into linear projections.

No contradiction.

Related Question