I'm trying to reduce the dimensionality and noise of a dataset by performing PCA on the dataset and throwing away the last few PCs.
After that, I want to use some machine learning algorithms on the remaining PCs, and therefore I want to normalize the data by equalizing the variance of the PCs to make the algorithms work better.
One simple way is to simply normalize the variance to unit values.
However, the first PC contains more variance from the original dataset than the following ones, and I still want to give it more "weight".
Therefore I was wondering: is there a simple way to just split its variance and share it with the PCs with less variances?
Another way is to map the PCs back to the original feature space, but in that case the dimensionality would also increase to the original value.
I guess it's better to keep the resultant columns orthogonal, but it's not necessary at this moment.
Best Answer
It is not completely clear to me that what you are asking is what you really need: a common preprocessing step in machine learning is dimensionality reduction + whitening, which means doing PCA and standardizing the components, nothing else. But I will nevertheless focus on your question as it is formulated, because it's more interesting.
Let $\mathbf X$ be the centered $n\times d$ data matrix with data points in rows and variables in columns. PCA amounts to singular value decomposition $$\mathbf X = \mathbf{USV}^\top \approx \mathbf U_k \mathbf S_k \mathbf V_k^\top,$$ where to perform the dimensionality reduction we keep only $k$ components. An orthogonal "factor rotation" of these components implies choosing an orthogonal $k \times k$ matrix $\mathbf R$ and plugging it into the decomposition: $$\mathbf X \approx \mathbf U_k \mathbf S_k \mathbf V_k^\top = \mathbf U_k \mathbf {RR}^\top \mathbf S_k \mathbf V_k^\top = \underbrace{\sqrt{n-1}\mathbf U_k^\phantom\top \mathbf {R}}_{\substack{\text{Rotated}\\\text{standardized scores}}} \cdot \underbrace{\mathbf R^\top \mathbf S_k \mathbf V_k^\top/\sqrt{n-1}}_{\text{Rotated loadings}^\top}.$$ Here $\sqrt{n-1}\mathbf U_k \mathbf R$ are rotated standardized components and the second term represents rotated loadings transposed. The variance of each component after rotation is given by the sum of squares of the corresponding loading vector; before rotation it is simply $s_i^2/(n-1)$. After rotation it is something else.
Now we are ready to formulate the problem in mathematical terms: given unrotated loadings $\mathbf L = \mathbf V_k \mathbf S_k / \sqrt{n-1}$, find rotation matrix $\mathbf R$ such that the rotated loadings, $\mathbf L \mathbf R$, has equal sum of squares in each column.
Let's solve it. Column sums of squares after rotation are equal to the diagonal elements of $$(\mathbf {LR})^\top \mathbf{LR} = \mathbf R^\top \frac{\mathbf S^2}{n-1} \mathbf R.$$ This makes sense: rotation simply redistributes the variances of components, which are originally given by $s_i^2/(n-1)$, between them, according to this formula. We need to redistribute them such they all become equal to their average value $\mu$.
I don't think there is a closed form solution to this, and in fact there are many different solutions. But a solution can be easily built in a sequential fashion:
This will redistribute all variances equally by a sequence of $(k-1)$ 2D rotations. Multiplying all these rotation matrices together will yield the overall $\mathbf R$.
Example
Consider the following $\mathbf S^2/(n-1)$ matrix: $$\left(\begin{array}{cccc}10&0&0&0\\0&6&0&0\\0&0&3&0\\0&0&0&1\end{array}\right).$$ The mean variance is $5$. My algorithm will proceed as follows:
Step 1: rotate PC1 and PC4 so that PC1 gets variance $5$. As a result, PC4 gets variance $1+(10-5)=6$.
Step 2: rotate PC2 (new maximal variance) and PC3 so that PC2 gets variance $5$. As a result, PC3 gets variance $3+(6-5)=4$.
Step 3: rotate PC4 (new maximal variance) and PC3 so that PC4 gets variance $5$. As a result, PC3 gets variance $4+(6-1)=5$.
Done.
I wrote the Matlab script that implements this algorithm (see below). For this input matrix, the sequence of rotation angles is:
Component variances after each step (in rows):
The final rotation matrix (product of three 2D rotation matrices):
And the final $(\mathbf{LR})^\top \mathbf{LR}$ matrix is:
Here is the code:
Here is the code in Python provided by @feilong:
Note that this problem is completely equivalent to the following one: given $k$ uncorrelated variables with variances $\sigma_i^2$, find a rotation (i.e. a new orthogonal basis) that will yield $k$ variables with equal variances (but of course not uncorrelated anymore).