Solved – Properties of PCA for dependent observations

iidnon-independentpcatime series

We usually use PCA as a dimensionality reduction technique for data where cases are assumed to be i.i.d.

Question: What are the typical nuances in applying PCA for dependent, non-i.i.d. data? What nice/useful properties of PCA that hold for i.i.d. data are compromised (or lost entirely)?

For example, the data could be a multivariate time series in which case autocorrelation or autoregressive conditional heteroskedasticity (ARCH) could be expected.

Several related question on applying PCA to time series data have been asked before, e.g. 1, 2, 3, 4, but I am looking for a more general and comprehensive answer (without a need to expand much on each individual point).

Edit: As noted by @ttnphns, PCA itself is not an inferential analysis. However, one could be interested in generalization performance of PCA, i.e. focusing on the population counterpart of the sample PCA. E.g. as written in Nadler (2008):

Assuming the given data is a finite and random sample from a (generally unknown) distribution, an interesting theoretical and practical question is the relation between the sample PCA results computed from finite data and those of the underlying population model.

References:

Best Answer

Presumably, you could add the time-component as an additional feature to your sampled points, and now they are i.i.d.? Basically, the original data points are conditional on time:

$$ p(\mathbf{x}_i \mid t_i) \ne p(\mathbf{x}_i) $$

But, if we define $\mathbf{x}_i' = \{\mathbf{x}_i, t_i\}$, then we have:

$$ p(\mathbf{x}'_i \mid t_i) = p(\mathbf{x}'_i) $$

... and the data samples are now mutually independent.

In practice, by including the time as a feature in each data point, PCA could have as result that one component simply points along the time feature axis. But if any features are correlated with the time feature, a component might consist of one or more of these features, as well as the time feature.