Solved – How to know when to stop reducing dimensions with PCA

classificationdimensionality reductionfeature selectionmachine learningpca

I'm using PCA to reduce dimensionality before I feed the data into a classifier. My bootstrap/cross-validation has shown a significant reduction in test error as a result of applying PCA and keeping the PCs whose standard deviation is a fraction (say, 0.05) of the standard deviation of the first PC. My features are actually histograms (i.e. vector-valued), so instead of applying PCA once globally to the whole dataset, I applied it locally to some features, which I preselected manually based on the number of features (picking the ones with the most columns). I've tried adjusting the aforementioned tolerance, and tried applying PCA to higher and lower numbers of these histogram features.

My question is, can someone please describe a more precise way of finding the optimal amount of dimensionality reduction via PCA as applied above which leads to the highest test accuracy of my classifier? Does it come down to running a loop with a sequence of tolerances and different PCA-treated features and computing the test error for each setting? This would be very computationally expensive.

Best Answer

Note that ridge penalisation/regularisation is basically doing model selection using pca. Although it does it smoothly by shrinking along each principal component axis rather than discretely by dropping small.variance pcs. Note that because you are doing pca on different subsets of variables this would roughly correspond to having different regularising parameters for each group rather than one for all the betas. The elements of statistical learning explains ridging quite well and provides some comparisons.

Related Question