What I'm doing now is ranking each model within each metric and summing the ranks. Whichever model has the lowest sum I am considering the best.

Perhaps it would be clearer if I used an example

There are 5 models. Model A, Model B, and Model C, Model D, and Model E. There are 3 evaluation metrics. A rank of 1 is the best.

I rank the models by each eval metric

model | eval metric #1 rank | eval metric #2 rank | eval metric #3 rank |
---|---|---|---|

Model A | 4 | 3 | 4 |

Model B | 5 | 2 | 2 |

Model C | 1 | 1 | 5 |

Model D | 3 | 4 | 1 |

Model E | 2 | 5 | 3 |

The sum of each evaluation metrics rank is

model | sum of rank |
---|---|

Model A | 11 |

Model B | 9 |

Model C | 7 |

Model D | 8 |

Model E | 10 |

In this example Model C has the lowest sum and would be considered the best model.

Does this process have a name? I'm having trouble searching google for a better solution.

## Best Answer

U could try to use Critical difference diagram to compare ML classifiers. Here is the details: https://www.jmlr.org/papers/volume7/demsar06a/demsar06a.pdf