In Discovering Statistics using SPSS 4th Edition by Andy Field, it was recommend to include the interaction term between the independent variable $x$ and its corresponding natural logarithm transform $\ln(x)$ variable to check for violation of the linearity assumption. What is the statistical theory behind this?
This a quote from the book:
This assumption can be tested by looking at whether the interaction term between the predictor and its log transformation is significant (Hosmer & Lemeshow, 1989).
I've also recently found out that this transformation is called Box-Tidwell transformation.
Best Answer
Box and Tidwell (1962) [1] presented a somewhat general approach for estimating transformations of the individual predictors (IVs), and work through the specific case of estimating power transformations of the predictor variables (including that power 0, which - with appropriate scaling - corresponds to taking logs as a limiting case).
In that particular case of power transformations, it turns out that there's a connection to regressing on $X_j\log(X_j)$.
So if you have nonlinearity of the kind where the true (conditional) relationship between $Y$ and $X_j$ is linear in $X_j^{\alpha_j}$ then it can be used to check for $\alpha_j\neq 1$, or indeed to estimate $\alpha$ values.
Specifically, when regressing on $X_j$ and $X_j\log(X_j)$ the coefficient of the second term divided by that of the first is an approximate estimate of $\alpha_j-1$. (This estimate can be iterated to convergence.)
If that estimated $\alpha_j$ is close to 1 then there's little indication of a need to transform.
Note that since the two terms in the product $X_j\log(X_j)$ are both functions of $X_j$, this is simply a transformed $X_j$ so I wouldn't call that an interaction; it's just a transformed predictor. (Indeed, even if I were somehow tempted to do so, since $\log(X_j)$ is not included as a predictor I still wouldn't tend to describe that second term as an interaction.)
[1]: Box, G. E. P. and Tidwell, P. W. (1962), "Transformation of the independent variables." Technometrics 4, 531-550.