I have not been able to find an answer to this in other discussions or in my readings.
Say I am modeling carVal
(i.e., a car's value) based on mpg
(numeric variable) and type
(factor variable with levels 0 = sedan, 1 = van, 2 = truck, 3 = suv) using a glm()
. I have read that if I am using some algorithm to select the "best" model features, it is not appropriate to drop some of the factor variables but keep the others (i.e., carVal ~ mpg + type1
is not valid, it would have to be carVal ~ mpg + type1 + type2 + type3
).
My question is, if I include an interaction term between mpg
and type
, is it appropriate to have an interaction for only certain levels of mpg
and type
, but not include all levels of type
for the interaction.
For example, is this a valid model:
carVal ~ mpg + type1 + type2 + type3 + type1:mpg
Or, would the formula have to be the following:
carVal ~ mpg + type1 + type2 + type3 + type1:mpg + type2:mpg + type3:mpg
Here is an example of the code I am using in version 4.0.2 of R:
library(leaps)
carVal = c(1000, 15000, 1500, 2000, 2500, 5000, 8000, 9500, 11000)
mpg = c(29, 45, 20, 28, 30, 40, 35, 38, 47)
type = as.factor(c(1, 2, 2, 3, 1, 0, 1, 0, 0))
car.data = data.frame(carVal, mpg, type)
subset.model = regsubsets(x = as.formula('carVal ~ mpg + type + type:mpg'), data = car.data, method = 'exhaustive')
summary(subset.model)
Best Answer
First, I would avoid any stepwise procedures.
That said:
Normally you would just specify the model as
or equivalently:
Then the software will create all the necessary dummy variables and interactions between them and
mpg
It looks from your question that you might be creating the dummy variables yourself, in which case there is no technical reason why you can't omit some of the interactions, if you have good reason to, but in my experience this can create all kinds of problems such as a rank-deficient model matrix, or an overfitted model which generalises extremely poorly to new data.
So if you want to interaction, just use
mpg * type
- it will make your life much easier.