At the moment I am trying to make a list of the different approaches that could be used to verify the linearity of an effect. In a model (Y = b0 + b1.X + etc.), I want to know whether it is acceptable to assume linearity for (X).
What I've been doing so far is to estimate another model (Y = b0 + b1.X + b2.X**2) based on a quadratic specification and (1) look at significance of the quadratic term (b2), and (2) eventually perform a log-likelihood ratio test.
However, I fear that this relatively simple approach would in some circumstances be misleading (especially if pattern of non-linearity is not in line with a quadratic shape). Indeed, this simple approach would fail to reject the assumption of linearity when I simulate data that would be be described by a S-shaped curve.
What approaches (other than polynomial specification + log-likelihood ratio test) would you recommend? Ideally a test – Not a simulation based approach, and something that would work also for non-nested models (unlike the LR test).
I came across the Vuong test (https://en.wikipedia.org/wiki/Vuong%27s_closeness_test), but I am sure there is more to known on this issue. Thanks for your help!
Best Answer
If you want to see if the relationship between (the conditional expectation of) $y$ and $x_0$ is linear, after adjusting for control variables $x_1, x_2, \dots, x_p$, a simple graphical approach is to create an added-variable plot using the following procedure.
First, regress $y$ on $x_1, x_2, \dots, x_p$ and obtain the residuals from that regression, $\hat{\epsilon}_y$. Then, regress $X_0$ on $x_1, x_2, \dots, x_p$ and obtain the residuals from that regression, $\hat{\epsilon}_{x_0}$.
Then, create a scatter plot of $\hat{\epsilon}_y$ against $\hat{\epsilon}_{x_0}$ and overlay a nonparametric curve (e.g. loess) along with the linear regression line. The linear regression line will have exactly the same slope as the "long" regression that includes all variables $x_0, x_1, \dots, x_p$ by the Frisch-Waugh theorem. The nonparametric curve will give you a sense of how well the relationship between $y$ and $x_0$ can be approximated as linear.
Some simple R code to demonstrate: