Solved – Ideal learning sample in machine learning

biasdata miningmachine learningpredictionsampling

I am constructing a model for the prediction of a binary (Yes/No) outcome. I have a learning sample that gives the machine 1500 examples of the "Yes" group and 500 example of the "No" group. Should I be using all the data I have for input to learn the machine? Would this be biased towards the "Yes"?

I had the thought of giving 500 "Yes" and 500 "No" examples, but I am not sure if this is going to positively or negatively my future predictions.

Thanks.

Best Answer

Most learning algorithms have a way to deal with skewed data sets. In general, use as much as you can for learning to increase generalization performance.

Related Question