Solved – What exactly is multi-hot encoding and how is it different from one-hot

categorical-encoding

In one-hot encoding there is one bit reserved for each word we desire to encode.

How is multi-hot encoding different from one-hot? In what scenarios would it make sense to use it over one-hot?

Best Answer

Imagine your have five different classes e.g. ['cat', 'dog', 'fish', 'bird', 'ant']. If you would use one-hot-encoding you would represent the presence of 'dog' in a five-dimensional binary vector like [0,1,0,0,0]. If you would use multi-hot-encoding you would first label-encode your classes, thus having only a single number which represents the presence of a class (e.g. 1 for 'dog') and then convert the numerical labels to binary vectors of size $\lceil\text{log}_25\rceil = 3$.

Examples:

'cat'  = [0,0,0]  
'dog'  = [0,0,1]  
'fish' = [0,1,0]  
'bird' = [0,1,1]  
'ant'  = [1,0,0]   

This representation is basically the middle way between label-encoding, where you introduce false class relationships (0 < 1 < 2 < ... < 4, thus 'cat' < 'dog' < ... < 'ant') but only need a single value to represent class presence and one-hot-encoding, where you need a vector of size $n$ (which can be huge!) to represent all classes but have no false relationships.

Note: multi-hot-encoding introduces false additive relationships, e.g. [0,0,1] + [0,1,0] = [0,1,1] that is 'dog' + 'fish' = 'bird'. That is the price you pay for the reduced representation.