Many machine learning approaches use one-hot vectors to represent categorical data. This is sometimes called using indicator features, indicator vectors, regular categorical encoding, dummy coding, or one-hot encoding (among other names).
I'm searching for a compact way to denote a one-hot vector within a model.
Say we have a categorical variable with $m$ categories. First, apply some arbitrary sorting to the categories. A one-hot vector $v$ is then a binary vector of length $m$ where only a single entry can be one, all others must be zero. We set the $i^\text{th}$ entry to 1, and all others to 0, to indicate that this $v$ represents the categorical variable taking on the $i^\text{th}$ possible value.
One clunky attempt based on misguided set notation;
$$
v \in \{0, 1\}^m \qquad\qquad \sum_{i=1}^m v_i = 1
$$
I've also seen math-oriented people refer to a one-hot vector using the notation
$$
\mathbf{e}_i
$$
But I don't understand where this notation comes from or what it is called.
Can anyone help me out? Is there a paper that does a good job of this?
Thank you,
Best Answer
There are several ways to note dummy variables (or one-hot encoded), one of them is the indicator function :
$$ \mathbb{1}_A(x) := \begin{cases} 1 &\text{if } x \in A, \\ 0 &\text{if } x \notin A. \end{cases} $$
For $e_i$ it is a vector of the standard base, where $e_i$ denotes the vector with a $1$ in the $i$ ith coordinate and $0$'s elsewhere. For example, in $\mathbb{R}^5$, $e_3 = (0, 0, 1, 0, 0)$