Quoting from the link in the above question, the methodology is as follow
principal component analysis (PCA) can be used to determine the underlying drivers of the stock returns. The PCA method transforms the vector space of N assets into another vector space of N factors by singular value decomposition (SVD) of the sample covariance matrix. Each factor, an eigenvector from the SVD, represents a linear combination of the original N assets, and the factors are uncorrelated by definition, with variances equal to the eigenvalues from the SVD.
Asset returns and sample covariance matrix can be written as
$$
R_i^e = \beta_{i,1}F_{1} + \beta_{i,2}F_{2} + \cdots + \beta_{i,N}F_{N} \\
\hat{\Sigma} = \beta D_{F} \beta^{T}
$$
Where $\beta$ represents N columns of eigenvectors, and $D_F$ is the N by N diagonal matrix of eigenvalues.
PCA is often employed to reduce dimensionality of the data. If the first L factors govern most of the
variability of the asset returns, i.e. if $\frac{\Sigma_{l=0}^{L} \sigma_{F,l}^2}{\Sigma_{l=0}^{N} \sigma_{F,l}^2}$ is very close to 1, then the last N-L factors shall be dropped,
$$
\hat{\Sigma} = \tilde{\beta}\tilde{D_F}\tilde{\beta}^T + D_{\epsilon}
$$
Where $\tilde{\beta}$ is the N-asset by L-factor matrix of factor loadings (first L eigenvectors), $\tilde{D_F}$ is the L by L diagonal matrix of the first L eigenvalues, and $D_{\epsilon}$ is the N-asset by N-asset diagonal matrix of variances of idiosyncratic components not explained by the first L factors.
See Chapter 8 of Professor Jorion’s “Value At Risk” for more details.
This technique is often used when the number of assets N is close to the number samples T, leading to spurious correlations in the sample covariance and when N > T, a sample covariance matrix which is singular.
Example
As a concrete example, here is an implementation in R
for returns generated from the 1 factor model
$$
R_{t} = m_{t}\beta + \epsilon_{t}
$$
where $R_{t}$ is an Nx1 vector of returns at time t, $m_t$ is the market return at time t, $\beta$ are the Nx1 betas of the assets to the market return and $\epsilon_{t}$ is Nx1 gaussian noise at time t
set.seed(42)
N <- 15
T <- 30
mvol <- 0.8
market.betas <- runif(N, 0, 2)
market.factor <- rnorm(T, 0, sd=mvol)
epsilon <- matrix(rnorm(N*T, 0, sd=1), ncol=N)
equity.rets <- market.factor %*% t(market.betas) + epsilon
sample.cov <- cov(equity.rets)
prs <- prcomp(equity.rets)
Keeping all the factors, we can reconstruct the sample variance exactly (modulo machine precision)
sum(abs(sample.cov - prs$rotation %*% diag(prs$sdev^2) %*% t(prs$rotation)))
[1] 8.925881e-13
Or we can drop PCs with less variance. A detailed answer discussing this is Relationship between SVD and PCA. Here we choose only the first PC, with the omniscience that this is a 1 factor model.
eigs <- prs$sdev^2
eigs[-1] <- 0
pca1.cov <- prs$rotation %*% diag(eigs) %*% t(prs$rotation)
Comparing the PCA covariance and sample covariance to the model covariance, $Var(m_{t}\beta)$, we can see improvements across a variety of distance metrics.
model.cov <- mvol^2 * market.betas %*% t(market.betas)
d1 <- function(m1, m2){sum(abs(m1 - m2))}
d2 <- function(m1, m2){sum((m1 - m2)^2)}
dinf <- function(m1, m2){max(abs(m1 - m2))}
dist <- data.frame(
c(d1(model.cov, sample.cov), d1(model.cov, pca1.cov)),
c(d2(model.cov, sample.cov), d2(model.cov, pca1.cov)),
c(dinf(model.cov, sample.cov), dinf(model.cov, pca1.cov))
)
colnames(dist) <- c("d1", "d2", "dinf")
rownames(dist) <- c("sample cov", "pca1 cov")
dist
d1 d2 dinf
sample cov 74.25255 42.26942 1.6620401
pca1 cov 52.13983 18.74075 0.8036362
The paper you cited (Donoho et al. 2013 Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model) is an impressive piece of work which I confess I did not really study. Nevertheless, I believe that it is easy to see that an answer to your question is negative: using any kind of shrinkage estimator of the covariance matrix will not improve your PCA results and, specifically, will not lead to "better understanding of the structure in the data".
In a nutshell, this is because shrinkage estimators only affect the eigenvalues of the sample covariance matrix and not the eigenvectors.
Let me quote the beginning of the abstract of Donoho et al.:
Since the seminal work of Stein (1956) it has been understood that the empirical covariance matrix
can be improved by shrinkage of the empirical eigenvalues. In this paper, we consider a proportional-growth asymptotic framework with
$n$ observations and $p_n$ variables having limit $p_n/n \to \gamma \in
(0,1]$. We assume the population covariance matrix $\Sigma$ follows the popular
spiked covariance model, in which several eigenvalues are significantly larger
than all the others, which all equal $1$. Factoring the empirical
covariance matrix $S$ as $S = V \Lambda V'$ with $V$ orthogonal and $\Lambda$
diagonal, we consider shrinkers of the form $\hat{\Sigma} = \eta(S) = V
\eta(\Lambda) V'$ where $\eta(\Lambda)_{ii} = \eta(\Lambda_{ii})$ is a scalar
nonlinearity that operates individually on the diagonal entries of $\Lambda$.
The abstract goes on to describe paper's contributions, but what is important for us here is that the sample covariance matrix $S$ and its shrinked version $\hat\Sigma$ have the same eigenvectors. Principal components are given by projections of the data onto these eigenvectors; so they will not be affected by the shrinkage.
The only thing that can get affected are the estimates of how much variance is explained by each PC because these are given by the eigenvalues. (And as @Aksakal wrote in the comments, this can affect the number of retained PCs.) But the PCs themselves will not change.
Best Answer
you will find a nice summary given by user @ttnphns here: https://stats.stackexchange.com/q/22520.
In particular:
In general you should always center your data when performing PCA. As explained here, not centering your data can give misleading results.