The issue with representing a categorical variable that has $k$ levels with $k$ variables in regression is that, if the model also has a constant term, then the terms will be linearly dependent and hence the model will be unidentifiable. For example, if the model is $μ = a_0 + a_1X_1 + a_2X_2$ and $X_2 = 1 - X_1$, then any choice $(β_0, β_1, β_2)$ of the parameter vector is indistinguishable from $(β_0 + β_2,\; β_1 - β_2,\; 0)$. So although software may be willing to give you estimates for these parameters, they aren't uniquely determined and hence probably won't be very useful.
Penalization will make the model identifiable, but redundant coding will still affect the parameter values in weird ways, given the above.
The effect of a redundant coding on a decision tree (or ensemble of trees) will likely be to overweight the feature in question relative to others, since it's represented with an extra redundant variable and therefore will be chosen more often than it otherwise would be for splits.
A note on terminology:
As far as I am aware (unfortunately, there are a lot of blogs written by people who overlook the subtle differences and thus mis-information spreads):
One hot encoding is exactly what you described, generating a map from each unique value in a string column to an integer
Dummying is making K new columns (in which K is the number of unique values), of which exactly one column per row must be one.
In the "dog, cat, horse" example, when using a decision tree, consider the following example. Perhaps your target variable is "has it ever meowed?". Clearly what you want your decision tree to do is be able to ask the question "is it a cat? (yes/no)".
If you one-hot encode, such that dog -> 0, cat-> 1, horse->2, the tree can't isolate all of the cats using one question, because decision trees always split using "is feature x greater than or less than X?"
If you're using logistic regression, it also can't assign higher probabilities of meowing to cats.
If you dummy, the tree can explicitly ask the question "the column which signifies cat greater than 0.5?", thus splitting your data into cats and not cats.
If you use logistic regression, your optimiser can learn that the coefficient related to this column should be positive.
Thus in my opinion, whenever you have categorical data which has no implicit ordinality, always dummy, never one-hot encode.
In the case where your data has high cardinality, this could cause problems, especially if the number of examples of each type is tiny, but this is a problem you can't really solve, you simply have too detailed information for the size of your training data and using it would lead to over-fitting.
Nonetheless, one way to mitigate this, is to do some manual clustering (or actual clustering), in which you make a synthetic column, which can take fewer values, and many of the unique values of the original column map to the same value in the new column (e.g. dog, cat, horse-> mammal, pigeon, parrot , chicken -> bird). This makes it easier for the algorithm to learn, and if there's enough data, it can split further within each cluster.
Best Answer
It seems that "label encoding" just means using numbers for labels in a numerical vector. This is close to what is called a factor in R. If you should use such label encoding do not depend on the number of unique levels, it depends on the nature of the variable (and to some extent on software and model/method to be used.) Coding should be seen as a part of the modeling process, and not only as some preprocessing!
Similar questions have been asked before, and you can find some good questions&answers here. But in short:
If the levels are ordered, you could use numerical encoding ("label encoding", but assuring that the numbers are assigned in correct order.)
If not ordered, you need dummy variables.
For binary variables, like Sex, it does not matter if you code as numerical 0/1 or as a factor, in both cases it will be treated the same way in a model.
If one variable has a value "not applicable" (like being pregnant for men), then see How do you deal with "nested" variables in a regression model?
If you have categories with very many levels see Principled way of collapsing categorical variables with many levels?
Most of theory and practice about categorical variables is developed in the context of linear models, glm's or at least models with some linear elements. Trees and forests are not in this class, so might require new/different thinking, and maybe depend much more on software. See for instance Dropping one of the columns when using one-hot encoding and Random Forest Regression with sparse data in Python.