Focal Loss – How to Choose Gamma Parameter

hyperparameterloss-functionsmachine learning

I would like to know if exists a rule of thumb to set the $\gamma$ parameter in Focal Loss when we have very imbalanced classes.

The focal loss first appeared in Focal Loss for Dense Object Detection by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

Best Answer

The parameter $\gamma$ smoothly adjusts the rate at which easy examples are down-weighted and that is quite dataset and application dependent. In the paper, the focal loss is actually given as: $−\alpha (1 − p_t)^\gamma \log(p_t)$ which is a reformulated view of the standard cross-entropy loss and the class imbalance itself is "controlled" by $\alpha$ rather than $\gamma$. People often treat the focal loss as a tool to primarily address class imbalance whether it is actually a tool to primarily address information asymmetry during learning; i.e. focal loss can very relevant when training with a balanced set where one of the two classes is easy to distinguish. But to bring us back: there is no good rule of thumb aside setting $\gamma=2$ (as the paper suggests) and then adjusting it based on our evaluation criteria. This point highlights the difference between a loss function and an evaluation metric; CV.SE has thread on Loss function and evaluation metric if you want to explore this distinction further but the main point to carry here is that $\gamma$ (i.e. a hyper-parameter of our loss) needs to be adjust in relation to our evaluation criteria and "on it's own" is often meaningless.

Related Question