Solved – How to check the linearity assumption

At the moment I am trying to make a list of the different approaches that could be used to verify the linearity of an effect. In a model (Y = b0 + b1.X + etc.), I want to know whether it is acceptable to assume linearity for (X).

What I've been doing so far is to estimate another model (Y = b0 + b1.X + b2.X**2) based on a quadratic specification and (1) look at significance of the quadratic term (b2), and (2) eventually perform a log-likelihood ratio test.

However, I fear that this relatively simple approach would in some circumstances be misleading (especially if pattern of non-linearity is not in line with a quadratic shape). Indeed, this simple approach would fail to reject the assumption of linearity when I simulate data that would be be described by a S-shaped curve.

What approaches (other than polynomial specification + log-likelihood ratio test) would you recommend? Ideally a test – Not a simulation based approach, and something that would work also for non-nested models (unlike the LR test).

I came across the Vuong test (https://en.wikipedia.org/wiki/Vuong%27s_closeness_test), but I am sure there is more to known on this issue. Thanks for your help!

data(mtcars) # full model, with all control variables fullmod = lm(mpg ~ wt + vs + gear + am, mtcars) coef(mod)[2] > wt > -3.786 # regress y on controls and x on controls, extract residuals eps_y = lm(mpg ~ vs + gear + am, mtcars)$residuals eps_x = lm(wt ~ vs + gear + am, mtcars)$residuals # regress epsilon_y on epsilon_x, see the coef is the same as above coef(lm(eps_y ~ eps_x))[2] > eps_y > -3.786 # make added variable plot library(ggplot2) qplot(x = eps_x, y = eps_y) + geom_smooth(method = "lm", colour = "black", se= FALSE) + geom_smooth(method = "loess", colour = "red", se = FALSE)

Best Answer

If you want to see if the relationship between (the conditional expectation of) $y$ and $x_0$ is linear, after adjusting for control variables $x_1, x_2, \dots, x_p$, a simple graphical approach is to create an added-variable plot using the following procedure.

First, regress $y$ on $x_1, x_2, \dots, x_p$ and obtain the residuals from that regression, $\hat{\epsilon}_y$. Then, regress $X_0$ on $x_1, x_2, \dots, x_p$ and obtain the residuals from that regression, $\hat{\epsilon}_{x_0}$.

Then, create a scatter plot of $\hat{\epsilon}_y$ against $\hat{\epsilon}_{x_0}$ and overlay a nonparametric curve (e.g. loess) along with the linear regression line. The linear regression line will have exactly the same slope as the "long" regression that includes all variables $x_0, x_1, \dots, x_p$ by the Frisch-Waugh theorem. The nonparametric curve will give you a sense of how well the relationship between $y$ and $x_0$ can be approximated as linear.

Some simple R code to demonstrate:

Best Answer

Related Solutions

Related Question