I am doing simple multi class classification ML problem.
I was given train data with perfectly balanced classes. However the data I must predict is not balanced. I was able to deduct the class proportions of test data.
Is there a way to split train data into train/validation data sets so that validation data set will have class proportions arbitrary set?
To cut it short: lot's of people want to make balanced training and validation set from imbalanced data. I want the reverse: I want to make imbalanced validation set from balanced training set;
Reasoning: I want my validation set to look like test data set; I know that 2 labels out of 7 cover 90% of data in test set (while they cover only 28% in train); I want to pass the same structure to my validation set;
Best Answer
i'm not sure about the purpose of you'r taks but you can do it with
use the argument stratify with the proportion of each class in test set