Solved – Learn threshold for multi-label classification

I have a multi-label problem which I'm tackling with a NN. To get the multi-label scores, I use a tanh on the last layers (as suggested in the literature), and then selecting the ones corresponding to a classified label according to a threshold (which, again, is often suggested to be put at 0.5). For example (pseudocode of what's happening in the network):

threshold = 0.5
last_hidden = [1,0.995,0.39,-0.283,-1.033]
multi_label_scores = tanh(last_hidden)  # [ 0.7615942 ,  0.7594864 ,  0.3713602 , -0.27567932, -0.77510875]
labels = [1 for s in multi_label_scores if s >= threshold] # [1, 1, 0, 0, 0]

My question is: apart from putting the threshold to 0.5, or maybe finding a better value during the parameter tuning, is there a way to learn such threshold, for example using specific max-margin loss functions (or similar)?

—- EDIT —-
The suggested threshold would be actually 0 for a tanh, not 0.5 (which would be used for a sigmoid). Anyway, it's just a translation. The problem is still the same.

Best Answer

Threshold for your output neuron is also a hyper-parameter and can be tuned just like others. The $0.5$ suggestion is probably for sigmoid function, because it is symmetric around 0 and hits $0.5$ at $0$. Similarly for tanh (check its symmetry), the so-called suggested is probably $0$, not $0.5$. But this is like saying your suggested neural network size is 2 layers etc. You should also tune your threshold. Several statistics such as ROC curve, Precision/Recall curves are obtained the measure the performance while changing this threshold, and they're used to understand the behavior of the system. By the way, a more commonly suggested option for sigmoid, for instance, is to use your class priors.

Best Answer

Related Solutions

Solved – Multi-label classification problem: choosing the right threshold value for y = 1

Related Question