Solved – Python Scikit-Learn SVM – No Predicted Samples for a Class

classificationpythonscikit learnsvm

I am doing a classification task in Python to classify audio files of different musical instrument into their respective class, in my case there are 4 class, which are Brass, String, Percussion, and Woodwind. I used SVM algorithm as the classifier. My code looks a bit like this (I do not change any parameter for the classifier):

#X is feature matrix, y is class vector
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

#SVM Classifier
svm = SVC()
svm.fit(X_train,y_train)
svm_pred = svm.predict(X_test)
print(metrics.classification_report(y_test,svm_pred)

When I try to run this code, I got problem with the classifier. The error code looks like this:

            precision  recall   f1-score   support

Brass         1.00      0.21      0.34        72
Percussion    0.38      1.00      0.55       279
String        1.00      0.15      0.26       276
Woodwind      0.00      0.00      0.00       156

avg / total   0.58      0.43      0.32       783

C:\Users\Anaconda3\lib\site-packages\sklearn\metrics\classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.

When I checked my predicted labels from SVM classifier(svm_pred), no Woodwind class are predicted by the classifier

>>> set(svm_pred)
{'Brass','String','Percussion'}

My number of samples for each class are like this: Brass = 200 samples, Woodwind = 500 samples, Percussion = 900 samples, and String = 800 samples so it is a bit imbalanced

My question is, is it possible for a SVM classifier to not predict a class at all in the output of the classifier like my case above?

Best Answer

First, you write "I do not change any parameter for the classifier." This is poor practice. Different problems have different optimal hyperparameter configurations. This is because we need to find the optimal bias-variance-tradeoff for your particular data.

Second, you need a method to relate the inputs (audio signals) to the model that you're using. The absolutely generic application of an SVM is comparing vectors. Are the audio files you're using represented as single vectors, or are they represented in some more complex format (perhaps a matrix)? Also, the default SVM kernel in sklearn is the RBF kernel. Does this kernel make sense for your inputs, or would an alternative kernel return more relevant comparisons of your inputs? Could a pre-processing step help extract more useful signal? These are all research questions you'll have to consider as you pursue your project.

Third, the predict method uses a default number to decide at what level of predicted outcome the sample should be allocated to a class. What this means is that the classification is sensitive to the choice of cutoff, and choosing a different cutoff will yield a different classification.

Classification is a decision. To make an optimal decision, you need to asses a utility function, which implies that you need to account for the uncertainty in the outcome, i.e. a probability.
The costs of misclassification are not uniform across all units.
Don't use cutoffs.
Use proper scoring rules.
The problem is actually risk estimation, not classification.

In other words, it's worth considering what the consequences are for a correct or incorrect decision, and selecting classification rules which minimize your risk. This is true regardless of whether or not you think the process of audio reproduction is deterministic, because some problems may have steep costs for misclassifying specific instruments, but not others.

Best Answer

Related Solutions

Solved – SVM options in scikit-learn

SVM – Where to Read About Gamma Coefficient in SVM in Scikit-Learn?

Related Question