Solved – Training SVM yields predictions of only 1

machine learningpythonscikit learnsvm

I'm provided with the following dataset: Dataset. I'm meant to use sklearn to create a Support Vector Machine that can predict it.

I load A and B from my dataset into a 2 dimensional array called input_data and load the label from my dataset into an array called label.

First of all I'm scaling A and B to fit in the range of -1 to 1 using
sklearn.preprocessing.MinMaxScaler.

input_data = np.array(input_data)
minmax = pre.MinMaxScaler(feature_range=(-1,1))
input_data = minmax.fit(input_data).transform(input_data)
label = np.array(label)

Then I use sklearn.model_selection.train_test_split to stratify and split my data keeping 40% for my test set and 60% for my training set.

model_selector = model_selection
X_train, X_test, y_train, y_test = model_selector.train_test_split(input_data, label, stratify=label, test_size=0.4)

After that I use sklearn.model_selection.GridSearchCV to try and come up with the optimal parameters (in my case only C).

gscv = model_selector.GridSearchCV
clf = gscv(svm.SVC(), tuned_parameters, cv=5, scoring='precision')
clf.fit(X_train, y_train)

After that I evaluate my results, but my Support Vector Machine just outputs 1 no matter what the input is, I'm not sure what exactly I'm doing wrong so after trying to figure things out by reading the documentation for a few hours I decided to ask for help, can anyone point out to me what I'm doing wrong? I'm pretty new in this area so my mistake is probably something simple I missed. Thanks in advance for any advice!

Here's an image of the dataset using a scatter plot:
enter image description here

Best Answer

What is tuned_parameters?

If I train SVC with default parameters on your dataset, it works fine with 61% accuracy, predicting both classes.

model.predict(X_test) with the model trained with your parameters outputs both 0 and 1 for me, with 98% accuracy:

model = svm.SVC(kernel = 'rbf', C=10, gamma=10)
model.fit(X_train, y_train)
print(model.predict(X_test))
print(model.score(X_test, y_test))

So the question is, how do you check what the model outputs on your test set? You may have a mistake there.