Solved – Compact notation for one-hot indicator vectors

categorical-encodingnotation

Many machine learning approaches use one-hot vectors to represent categorical data. This is sometimes called using indicator features, indicator vectors, regular categorical encoding, dummy coding, or one-hot encoding (among other names).

I'm searching for a compact way to denote a one-hot vector within a model.

Say we have a categorical variable with $m$ categories. First, apply some arbitrary sorting to the categories. A one-hot vector $v$ is then a binary vector of length $m$ where only a single entry can be one, all others must be zero. We set the $i^\text{th}$ entry to 1, and all others to 0, to indicate that this $v$ represents the categorical variable taking on the $i^\text{th}$ possible value.

One clunky attempt based on misguided set notation;

$$
v \in \{0, 1\}^m \qquad\qquad \sum_{i=1}^m v_i = 1
$$

I've also seen math-oriented people refer to a one-hot vector using the notation

$$
\mathbf{e}_i
$$

But I don't understand where this notation comes from or what it is called.

Can anyone help me out? Is there a paper that does a good job of this?

Thank you,

Best Answer

There are several ways to note dummy variables (or one-hot encoded), one of them is the indicator function :

$$ \mathbb{1}_A(x) := \begin{cases} 1 &\text{if } x \in A, \\ 0 &\text{if } x \notin A. \end{cases} $$

For $e_i$ it is a vector of the standard base, where $e_i$ denotes the vector with a $1$ in the $i$ ith coordinate and $0$'s elsewhere. For example, in $\mathbb{R}^5$, $e_3 = (0, 0, 1, 0, 0)$