To add to AdamO's answer, I was taught to base my decisions regarding model assumptions more on whether failing to correct the assumption in some way causes me to misrepresent my data. For a concrete example of what I mean, I simulated some data in R
and created some plots and ran some diagnostics using these data.
# lmSupport contains the lm.modelAssumptions function that I use below
require(lmSupport)
set.seed(12234)
# Create some data with a strong quadratic component
x <- rnorm(200, sd = 1)
y <- x + .75 * x^2 + rnorm(200, sd = 1)
# There is a significant linear trend
mod <- lm(y ~ x)
summary(mod)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.7972 -0.9511 -0.1312 0.6659 5.8659
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.77981 0.10463 7.453 2.77e-12 ***
x 1.19417 0.09795 12.191 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.477 on 198 degrees of freedom
Multiple R-squared: 0.4288, Adjusted R-squared: 0.4259
F-statistic: 148.6 on 1 and 198 DF, p-value: < 2.2e-16
However, when plotting the data, it's clear that the curvilinear component is an important aspect of the relationship between x and y.
pX <- seq(min(x), max(x), by = .1)
pY <- predict(mod, data.frame(x = pX))
plot(x, y, frame = F)
lines(pX, pY, col = "red")
A diagnostic test of linearity also supports our argument that the quadratic component is an important aspect of the relationship between x and y for these data.
lm.modelAssumptions(mod, "linear")
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
0.7798 1.1942
ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance = 0.05
Call:
gvlma(x = model)
Value p-value Decision
Global Stat 180.04567 0.000e+00 Assumptions NOT satisfied!
Skewness 32.67166 1.091e-08 Assumptions NOT satisfied!
Kurtosis 23.99022 9.683e-07 Assumptions NOT satisfied!
Link Function 123.35831 0.000e+00 Assumptions NOT satisfied!
Heteroscedasticity 0.02547 8.732e-01 Assumptions acceptable.
# We should probably add the quadratic component to this model
mod <- lm(y ~ x + I(x^2))
Let's see what happens when we simulate data with a smaller (but still significant) nonlinear trend.
y <- x + .25 * x^2 + rnorm(200, sd = 1)
mod <- lm(y ~ x)
summary(mod)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.59701 -0.77446 0.03546 0.80261 2.75938
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.30500 0.07907 3.858 0.000155 ***
x 0.99934 0.07402 13.500 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.116 on 198 degrees of freedom
Multiple R-squared: 0.4793, Adjusted R-squared: 0.4767
F-statistic: 182.3 on 1 and 198 DF, p-value: < 2.2e-16
If we examine a plot of these new data, it's pretty clear that they are well-represented by just the linear trend.
pX <- seq(min(x), max(x), by = .1)
pY <- predict(mod, data.frame(x = pX))
plot(x, y, frame = F)
lines(pX, pY, col = "red")
This is in spite of the fact that this model fails a diagnostic test of linearity.
lm.modelAssumptions(mod, "linear")
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
0.3050 0.9993
ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance = 0.05
Call:
gvlma(x = model)
Value p-value Decision
Global Stat 34.6428 5.500e-07 Assumptions NOT satisfied!
Skewness 0.3355 5.624e-01 Assumptions acceptable.
Kurtosis 2.0094 1.563e-01 Assumptions acceptable.
Link Function 32.1379 1.436e-08 Assumptions NOT satisfied!
Heteroscedasticity 0.1600 6.892e-01 Assumptions acceptable.
My point is that diagnostic tests should not be a substitute for thinking on the part of the analyst; they are tools to help you understand whether your substantive conclusions follow from your analyses. For this reason, I prefer to look at different types of plots rather than rely on global tests when I'm making these sorts of decisions.
Best Answer
Let me explain what linearity means with nominal/dummy variables. In essence, it means there is no interaction term between your independent variables that you have left out.†
Suppose we have two nominal variables $x_0$ and $x_1$, each taking values 0 or 1, and a response variable $y$. (The general case is similar.)
If we model $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon$:
$\beta_0$ is the expected response when $x_1 = x_2 = 0$
$\beta_0 + \beta_1$ is the expected response when $x_1 = 1, x_2 = 0$
$\beta_0 + \beta_2$ is the expected response when $x_1 = 0, x_2 = 1$
$\beta_0 + \beta_1 + \beta_2$ is the expected response when $x_1 = x_2 = 1$
There's a relationship here, since we have 3 coefficients but four cases: The last minus the first is the sum of the second minus the first and the third minus the first.
If this relationship actually holds in your situation between the expected responses, then this linear model can be a good one. If not, then the failure of this relationship is a type of nonlinearity.
If we include an interaction term, then linearity is automatically satisfied, because we have four coefficients to fit the four cases. That is, with a model $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + \epsilon$ there is no restriction on the relationship between the expected responses in the four cases above. (However the distributions of y in these 4 cases may still be different, which would violate the model as written.)
How do you test whether you can leave out the interaction term? One way would be to try including it and test whether the coefficient $\beta_3$ is distinct from zero. For example, in the case of normal error $\epsilon$, this would be a $t$-test for a slope coefficient in a regression.
† An interaction between $x_1$ & $x_2$ is a type of (multi-dimensional) nonlinearity: there's no possibility of a nonlinear relationship between $\operatorname{E}Y$ and $x_1$ when $x_1$ is a dummy variable, but there is between $\operatorname{E}Y$ and $(x_1,x_2)$. That is, there may be no plane passing through the four points $(0,0,\operatorname{E}(Y|\,0,0))$, $(1,0,\operatorname{\operatorname{E}}(Y|\,1,0))$, $(0,1,\operatorname{E}(Y|\,0,1))$, $(1,1,\operatorname{E}(Y|\,1,1))$.
For dummy variables, these interaction terms are the only potential source of nonlinearity of the expected responses.