Machine Learning – Essential Steps for Data Processing Before Applying SVM

machine learningsvm

I am working on classification of audio files. It is a binary classification and I plan to use SVM. I have used SVM before for face matching and other image analysis and retrieval stuff.

I have extracted the required feature vectors from the audio files, i.e., the training and test dataset and reduced their dimensionality by using Principal Component Analysis. I would like to know whether there are any more steps which are necessary before applying SVM classification and prediction? Should the test and training dataset obtained after applying PCA be normalized or centered? Would the results be different (better / worse) after applying the normalization / centerization? Or are there any more methods that can be used to pre-process data before SVM is applied on it?

Best Answer

It's advised to scale all inputs to a set interval ($[-1,1]$ or $[0,1]$ are popular choices). That way you won't get any bias towards specific inputs which happen to have large values. Scaling can have a large effect on accuracy. Make sure to use the same scaling factors on both training and testing data.

For more information, you can have a look at a practical guide to SVM classification.