I classified some medical images. And distribution of the dataset is :
494 Train Anormal
469 Train Normal
37 Test Normal
64 Test Anormal
84 Val Anormal
37 Val Normal
…
My training result is (by ViT):
loss: 0.2714 – accuracy: 0.9102 – val_loss: 0.2624 – val_accuracy: 0.9196
and test result is:
precision recall f1-score support
0 0.57 0.60 0.58 47
1 0.46 0.43 0.44 37
accuracy 0.52 84
macro avg 0.51 0.51 0.51 84
weighted avg 0.52 0.52 0.52 84
- So my question is that test prediction is not good because of
imbalanced data? Or I should figure out something else?
I know 1000 images are not a nice thing in DL but I have to complete this training with them. Also, I implemented data augmentation.
Best Answer
There might be some difference between the training/validation and the testing data, causing performance gap. Another potential cause is that you tuned your model heavily on the validation set. This might make the model overfit to the validation set.