Solved – Sklearn – Choosing the right model for supervised learning/classification task

classificationmachine learningpythonscikit learn

I am beginning to learn how to use scikit-learn and I have a hard time choosing the right model.

Here is my dataset:
I have 100 persons. Each person was measured three times: baseline, first event and second event.
Each measurement had 100 different markers per person that range from 0.1 to 1000.
Additionally I have outcome measurements of each event: outcome can be 0, 1 or 2.
My task is to find just a few markers (let’s say 10) that can predict outcome with a good accuracy.
If I am right it should be: Supervised learning/Classification problem.
What model would be the best?

Thanks for your help!

Best Answer

I'd try visualizing data in 3d, use generalized scatterplot matrix or use PCA (if you know what it is) to project data to 2d and then try to see the structure.

In such low-dimensional dataset it should be easy to see visually which classifier will be the best, using scikit-learn's comparison of different classifiers.

Related Question