What is the difference between binary encoding and one-hot for categorical input variables for English Text and their impact on the neural network?
Can anyone help me to find a scientific paper about this problem?
Solved – Binary Encoding vs One-hot Encoding
categorical-encodingclassificationmachine learningneural networks
Related Question
- Solved – “one-hot” encoding called in scientific literature
- Solved – Should one-hot output encoding be used in backpropagation
- Solved – dumthe vs one-hot encoding – ML for prediction
- Solved – Label encoding vs Dumthe variable/one hot encoding – correctness
- Solved – Do ordinal variables require one hot encoding
- Solved – Compact notation for one-hot indicator vectors
Best Answer
If you have a system with $n$ different (ordered) states, the binary encoding of a given state is simply it's $\text{rank number} - 1$ in binary format (e.g. for the $k$th state the binary $k - 1$). The one hot encoding of this $k$th state will be a vector/series of length $n$ with a single high bit (1) at the $k$th place, and all the other bits are low (0).
As an example encodings for the next system (levels of education):
References: One hot encoding at Wikipedia
And a 2017 paper on the comparison on the effects of different encodings to neural networks in the International Journal of Computer Applications could be a good starting point: A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers