Solved – Quadratic terms in regression with categorical variables

categorical datamultiple regressionregression

I am working my way for the first time through predicting a continuous dependent variable in a problem where all independent variables are categorical using python statsmodels. I would like to add to this model 'y ~ + C(x1) + C(x2) + C(x3)' all possible quadratic terms. What is the right notation for that?

EDIT: one of my categorical variables is age, which I binned in four different bins. All other are transformed into dummy. So the idea was to square age.

Best Answer

Quadratic terms for categorical variables are undefined because you cannot square a categorical variable. On the other hand, given that you have a continuous variable with nonlinear relationship to your outcome/dependent variable, categorization may help and be such that it is synonymous to introducing quadratic terms but with the added benefit of simpler model to build and explain.