Solved – PCA dramatically reduce the accuracy of classification

classificationMATLABpca

I am doing classification of this UCI Dataset in Matlab. I represented dataset as matrix (instances x dimensions) and 2nd matrix as (instances x label [instances x 1]).

With Naive Bayess I get accuracy of multiclass classification 0.65. But when I use dataset transformed with PCA I get accuracy only 0.15 even if I use all dimensions. I guess I am doing something wrong. This is my matlab code:

%x_tr, y_tr =training set, labels of training set
%x_tst,y_tst=testing set , labels of testing set
model = fitNaiveBayes(x_tr,y_tr);
Y=predict(model,x_tst);
acc=accuracyMC(Y); %0.65

%PCA usage
[COEFF,SCORE] = princomp(x_tr);
model_pca = fitNaiveBayes(SCORE,y_tr);
[COEFF,SCORE] = princomp(x_tst);
Y=predict(model_pca,SCORE);
acc_pca=accuracyMC(Y); %0.15

I also tried normalize it with z-score.

Best Answer

First, you are transforming your training set and test set independently.

What you want to do instead is perform PCA on your training set, obtain the coefficients, train on the transformed data, and the transform your test data using the training PCA coefficients before prediction.

See this MATLAB thread for instructions.

Secondly, and perhaps more importantly, given the multiple features families (textual, visual, and auditory) you have in your dataset, I'm not sure that the PCA transform is a valid choice.