Solved – Can target encoding be performed on a multi-label classification problem

categorical datacategorical-encodingdimensionality reductionmachine learningmultilabel

Is there a way to perform target encoding on multi label (closed set) problems, obviously target encoding is used on multi-class problems all the time, but i'm wondering if it works for multi label problems as well? and if there are any specialized methods for it.

A naive approach would be to create a dataset with duplicate rows but different target variable values for each label they contain, and then do target encoding on that, but I'm not sure how effective this would be.

Is there an alternative approach to target encoding is also appreciated, i just want to lessen the mess of having a massive amount of cardinality on my categorical variables.

Best Answer

The approach you describe might work and is worth trying.

The other obvious approach that I am aware of is to have a separate target encoding for each target (i.e. instead of one extra column, creating more). Actually, we might only need (number of targets)-1 variables (e.g. in the binary classification with 2 classes, we only need 1 variable, the other one is implicitly not the others), but it's worth trying also with the number of labels, as otherwise the model needs to learn to sum the other encodings up.

Since multi-label classification (as opposed to multi-class) is kind of hard for non-neural-network models, I guess you'll be using a neural network. If so, there's some things you can do quite easily in neural networks that might help your neural network along: You could for each output explicitly have a regression equation of target encoding (or encodings in case you have multiple variables) for this target + inputs from the rest of the neural network (which you might give the target encoding as an input, again). Of course, if you have many categories for categorical variables, then in neural networks embeddings can be a super-attractive alternative for representing categories (you can of course try one, the other or both at the same time). And (in general) it can be worth trying to use the embeddings that a neural network learnt as a feature for other model types.

Related Question