There are two different ways to encoding categorical variables. Say, one categorical variable has n values. Onehot encoding converts it into n variables, while dummy encoding converts it into n1 variables. If we have k categorical variables, each of which has n values. One hot encoding ends up with kn variables, while dummy encoding ends up with knk variables.
I hear that for onehot encoding, intercept can lead to collinearity problem, which makes the model not sound. Someone call it "dummy variable trap".
My questions:

Scikitlearn's linear regression model allows users to disable intercept. So for onehot encoding, should I always set fit_intercept=False? For dummy encoding, fit_intercept should always be set to True? I do not see any "warning" on the website.

Since onehot encoding generates more variables, does it have more degree of freedom than dummy encoding?
Best Answer
For an unregularized linear model with onehot encoding, yes, you need to set the intercept to be false or else incur perfect collinearity.
sklearn
also allows for a ridge shrinkage penalty, and in that case it is not necessary, and in fact you should include both the intercept and all the levels. For dummy encoding you should include an intercept, unless you have standardized all your variables, in which case the intercept is zero.The intercept is an additional degree of freedom, so in a well specified model it all equals out.
You could not fit a model in which you used all the levels of both categorical variables, intercept or not. For, as soon as you have onehotencoded all the levels in one variable in the model, say with binary variables $x_1, x_2, \ldots, x_n$, then you have a linear combination of predictors equal to the constant vector
$$ x_1 + x_2 + \cdots + x_n = 1 $$
If you then try to enter all the levels of another categorical $x'$ into the model, you end up with a distinct linear combination equal to a constant vector
$$ x_1' + x_2' + \cdots + x_k' = 1 $$
and so you have created a linear dependency
$$ x_1 + x_2 + \cdots x_n  x_1'  x_2'  \cdots  x_k' = 0$$
So you must leave out a level in the second variable, and everything lines up properly.
The second thing does not actually work. The $3 \times 4 = 12$ column design matrix you create will be singular. You need to remove three columns, one from each of three distinct categorical encodings, to recover nonsingularity of your design.