I'm a PhD student in Information Retrieval with some limited experience in ML. We've been working on a binary classification task with weka (I'm using weka programmatically via Java), specifically with Random Forest.
Our results are coming out a little weird because we have an unbalanced dataset (85/15ish). We're getting very high % correct, but precision and recall are very low for our target class (the 15% one).
My new understanding is that % correct is really not the right metric to be looking at. The professor I work with said (and I quote): "You are measuring accuracy with "percent correct". This is so rarely done in machine learning papers these days that I just blew right past it." He also referenced a paper explaining why I shouldn't use % correct as Accuracy [1].
In our case, we are interested in precision and recall to some extent, but the professor I'm working with (he's an ML expert) explained that we can and should use AUC-ROC to compare the runs because that's not sensitive to data balance. After he explained this in depth, I got it and understood. And, I was able to get the AUC data out of the Weka results, which are decent though not spectacular (in the 0.75 neighborhood).
I'm used to IR systems in which you can tune for various metrics, e.g. Precision, F values, MAP, etc. However, as far as I can tell, Weka always trains its classifier models to optimize for % correct. So even though I am interested in another metric, e.g., Precision or F1, I can't for the life of me figure out how to encourage Weka to train its model to focus on optimizing for anything other than % correct (say, F1).
I've combed the weka docs and Googled the heck out of it (incl. site search here on CrossValidated) but couldn't find anything to.
Is that possible? I would really appreciate any insight into whether that's even a possibility at all, is it just not implemented in Weka, or if there's some reason why it shouldn't be done. Or, if there's something I'm missing because I'm calling weka from Java rather than using the GUI.
—
[1] Provost, F. J., Fawcett, T., & Kohavi, R. (1998, July). The case against accuracy estimation for comparing induction algorithms. In ICML (Vol. 98, pp. 445-453).
http://eecs.wsu.edu/~holder/courses/cse6363/spr04/pubs/Provost98.pdf
Best Answer
My cursory search did not find this option either. As you describe the problem, you want to use:
Let's try to relax one condition at a time.
Here are some possible alternatives: