Binary variables
No encoding is needed: use them as is.
Nominal data
When you have an variable that can take on a finite number of values, that's called a categorical variable. When the values can't be ordered (e.g., red, blue, green), that's called a nominal variable. A nominal variable is one kind of categorical variable.
For nominal variables, the usual way to encode them is with a one-hot encoding. If there are $N$ possible values for the variable, you map each value to a $N$-vector that has a $1$ in the position corresponding to that value and $0$ elsewhere.
For instance: red $\mapsto (1,0,0)$, blue $\mapsto (0,1,0)$, green $\mapsto (0,0,1)$.
Ordinal data
When you have a categorical variable where the values can be ordered (sorted), but the ordering doesn't imply anything about how much they differ, that's called a ordinal variable (see ordinal data).
For example, suppose you have a ranking: John finished in 3rd place, Jane in 6th place. You know that John finished before Jane, but that doesn't necessarily mean that John was $6/3=2$ times as fast as Jane.
You can encode ordinal data using the thermometer trick. If there are $N$ possible values for the variable, then you map each value to a $N$-vector, where you put a $1$ in the position that matches the value of the variable and all subsequent position.
For instance: first place $\mapsto (1,1,1)$, second place $\mapsto (0,1,1)$, third place $\mapsto (0,0,1)$.
You can also apply binning if $N$ is too large, but usually it's better not to do that.
Numerical variables
Finally, you may encounter variables that directly measure a number, and where they can be not only ordered, but also subtracted or divided. Then, it's typically best to use the number directly, or possibly use the logarithm of the number. (You might take the logarithm if the number represents a ratio, or if there is a very wide range of values.)
Useful background
To understand these terms, it's helpful to learn about "level of measurements": https://en.wikipedia.org/wiki/Level_of_measurement.
Scaling
Finally, when you're using neural networks or "deep learning", you'll normally want to standardize/rescale all numerical attributes before applying deep learning. I suggest you treat that as a separate process from the feature mappings mentioned above, to be performed after you apply the feature mapping.
Best Answer
Yes, this sounds like label-encoding (a machine-learning term I never encountered in Statistics) and doesn't make much sense for unordered categorical variables. If the algorithm cannot cope with dummys, maybe try some variant of target/mean encoding (mentioned here).
Use first some linear model (maybe glmnet) with regularization appropriate for a categorical variable with many levels, see Principled way of collapsing categorical variables with many levels?, and then encode the categorical variable with the estimated coefficients for that variable from the linear model? That at least should be worth a try.