Solved – RandomForest classification model with 100% accuracy is it real or something wrong

classificationrandom forestspark-mllib

Hi I am new to machine learning. I just created my first working RandomForest classification ml model. It works amazingly well no error and accuracy is 100%. I have used Apache Spark MLlib to implement this algorithm. Other machine learning experts around say 100% accuracy is like dream we never get 100% accuracy is it true? I have trained randomforest classification algo with 95 decision trees and 15 depth of tree. I am using gini impurity and feature strategy as sqrt. I have cross validated my model with test data response values it matches 100%. I have two response values Actionalble/NonActionable. I told my senior I will test model with more data set of real time to see its truthfulness. Please guide. Thanks in advance.

Best Answer

Highly probable you have a "label leakage" in some of the features (feature has 100% correlation with label). E.g. if you have a data like this:

\begin{array} {|r|r|} \hline Feature & Label \\ \hline 1.0 & Actionalble \\ 0.0 & NonActionable \\ \hline \end{array}

Then model can always predict correct label by checking the value of the feature.