Solved – the role of a categorical predictor in polynomial regression

categorical datapolynomialrregression

I understand that there is a function in R called poly() that can generate orthogonal polynomials–useful for applying on input variables before running a predictive model.

My question is that what is the role of categorical variables when we generate polynomials? Are they to be excluded?

Update:

Dan, Thank you for your kind response. I'm not sure I understand it completely – let me explain the query in more detail. I'm trying to run logistic regression using glmnet on Titanic dataset.
Let us assume shortened set of columns:

    * class(factor with three levels 1, 2 ,3),
    * sex(factor: male, female),
    * Age (integer),
    *survived(factor & target variable 0 or 1).

The questions is it meaningful to create polynomial features based on these factors? e.g. class. If yes could you pls explain what it means?
I've seen examples with numeric input variables, where one can pass the entire input set to the poly() function and get polynomial features as output.
Your response is highly appreciated.

Best Answer

It's a little hard to answer without a specific example, but in general you can use orthogonal polynomials on the continuous variables and still include the categorical variables. Here's how it might work on some random data:

#create random data
dat <- data.frame(Y = rnorm(100), X = rep(1:50, 2), Cat = as.factor(rep(c("A", "B"), 50)))
#create second order orthogonal polynomial
x <- poly((unique(dat$X)), 2)
#insert it into original data frame
dat[,paste("ot", 1:2, sep="")] <- x[dat$X, 1:2]
#run regression
m <- lm(Y ~ (ot1 + ot2)*Cat, data=dat)
summary(m)