The question is about the practical use of polynomial regression.
Let's say there is a dataset with columns A, B, T where T is a dependent variable, A and B are independent variables. A and B contain missing values. I want to fill in the gaps with the mean, then normalize values by the formula:
(x – u) / s,
where u is the mean and s is the standard deviation.
Everything is clear when I use linear regression. What about polynomial?
A^2, B^2 and AB columns are added for a quadratic polynomial case. How to fill AB, if the values of A and B are missing?
By the product of averages? When calculating AB, should I multiply the normalized values or should I normalize the result after?
Best Answer
First, single imputations of missing predictor values are likely to lead to bias. See van Buuren's Flexible Imputation of Missing Data.
Second, there is usually no need to normalize the predictor values in this type of regression.
Third, for derived variables like $A^2$, $B^2$ and $AB$, van Buuren says in section 6.4.1:
So your best choice is to do multiple imputation of the missing data on $A$ and $B$ and then just let standard design-matrix calculations produce the polynomial terms from the $A$ and $B$ values in each imputed data set.