The classifier is KNN or RBF-SVM. After doing dimension reduction (e.g., PCA, LDA or KPCA, KLDA), does it need to do normalization before classification?
In LIBSVM
package, it always needs to first use svm-scale
to normalize the features using min-max normalization, then takes the normalized features as inputs for svm-train
.
I'm not sure whether the data normalization would harm the structures of the transformed features by PCA, LDA etc.
Best Answer
PCA does require normalization as a pre-processing step.
Would a further step of data normalization harm the data?
No, it would not harm the data. But would it be really necessary?
The following exercise returns
1.0
Why? We are projecting two whitened features onto the first component. Let's assume that a point in the whitened space is identified by a vector ($a$) The new vector ($a'$) is the result of the transformation $$a' = |a| * \cos(\theta) = a \cdot \hat{b} $$
where we have $|a|$ is the length of $a$; and $\theta$ is the angle between the vector $a$ and the vector we are projecting onto. In this case $b$ equals $e$, the eigenvectors, that maps each row vector onto the principal component.
What is the variance of the whitened feature once projected on the principal component?
$$\sigma^2 = \frac{1}{n} \sum^n (a_i \cdot e)^2 = e^T \frac{a^Ta}{n} e$$
$e^Te = 1$ by definition (eigenvectors are unit vectors). Note that when we whitened the data, we imposed that means are zero on the feature set.