You can definitely do that. you can introduce your categorical variable as a factorial one. If you have decided to use R programming this following code would be fine:
new_categ<-factor(categ,labels=c(0:2))
Then, you can interact the new categorical variable with other independent ones. You also could find examples centered around your problem in Modern Applied Statistics with S-PLUS by Venables and Ripley. However, if you are not willing to use R, you can still read its examples about regression which are beneficial for figuring out how to solve your problem.
If you use the proper multiplicative notation, the model coefficients need to account for the interpretation of the intercept term. Assume WLOG that only color and ph are in the model (this is a superfluous example you've provided). In this case, if "color==red" is the default group. There's technically only 1 dummy in the model, 1 if color is white, 0 otherwise.
Then, fitting the pH interaction, the "colorwhite" parameter is interpreted as the expected difference in the outcome comparing white to red having a pH of exactly 0. There is also a pH parameter interpreted as an expected difference in the outcome comparing groups differing by 1 unit in pH having color red. Lastly, the "colorwhite:pH" parameter is interpreted as a difference in differences for those groups, i.e. the incremental change in the pH slope comparing whites to reds.
I think you should rewrite your color formula to remove ":" and replace them with "*"
> set.seed(1)
> a <- sample(letters[1:3], 100, replace=TRUE)
> b <- sample(LETTERS[1:3], 100, replace=TRUE)
> y <- rnorm(100)
> lm(y ~ a * b)
Call:
lm(formula = y ~ a * b)
Coefficients:
(Intercept) ab ac bB bC ab:bB
0.1684 -0.3894 -0.2614 -0.2807 -0.3981 0.8720
ac:bB ab:bC ac:bC
0.2099 0.6215 0.4547
> lm(y ~ a : b) ## wrong
Call:
lm(formula = y ~ a:b)
Coefficients:
(Intercept) aa:bA ab:bA ac:bA aa:bB ab:bB
-0.03642 0.20484 -0.18456 -0.05654 -0.07587 0.40675
ac:bB aa:bC ab:bC ac:bC
-0.12738 -0.19331 0.03883 NA
> tapply(y, interaction(a, b), mean)
a.A b.A c.A a.B b.B c.B
0.168415779 -0.220978904 -0.092958625 -0.112286696 0.370329364 -0.163796519
a.C b.C c.C
-0.229732632 0.002406992 -0.036419652
Best Answer
If by dummy variables you're referring to multiple binary variables that make up one categorical predictor, each of them needs to be in the model for each other dummy to be meaningful. In stepwise regression either they are all in or all out, but not piecemeal. Are you doing this by hand or something? All stats packages I'm familiar with treat multilevel categoricals properly in this respect, and shouldn't consider dummy variables independently for model specification.
Again, you can't include interactions with some dummy variables of a single categorical predictor but not others. All in or all out. The test of whether the interaction needs to be included is a comparison between a model without interactions with all dummies and a model with interactions with all dummies. If the interaction is significant, you should keep it in any case. Just be aware that the interpretation of the "main effects" changes drastically when interactions are included in models.
If doing backwards stepwise regression, include the interaction terms.