Solved – Are normally distributed X and Y more likely to result in normally distributed residuals

assumptionsdata transformationnormal distributionregressionresiduals

Here the misinterpretation of the assumption of normality in linear regression is discussed (that the 'normality' refers the the X and/or Y rather than the residuals), and the poster asks if it is possible to have non-normally distributed X and Y and still have normally distributed residuals.

My question is: are normally distributed X and Y more likely to result in normally distributed residuals? There have been many related posts but I don't believe anyone as asked this question specifically.

I realise this is perhaps a trivial point if there is only one regression to do, but less so if there are multiple tests. So say I have 100 X variables which all have the same skew and I want to test them all. If I transformed them all to a normal distribution would it be likely that I would have less X variables needing re-examination (with different/no transformation) due to non-normally distributed residuals or would a pre-regression transformation be totally arbitrary?

Best Answer

No. The residuals are the $Y$ values conditional on $X$ (minus the predicted mean of $Y$ at each point in $X$). You can change $X$ any way you'd like ($X + 10$, $X^{-1/5}$, $X/\pi$) and the $Y$ values that correspond to the $X$ values at a given point in $X$ will not change. Thus, the conditional distribution of $Y$ (i.e., $Y | X$) will be the same. That is, it will be normal or not, just as before. (To understand this topic more fully, it may help you to read my answer here: What if residuals are normally distributed, but Y is not?)

What changing $X$ may do (depending on the nature of the data transformation you use) is change the functional relationship between $X$ and $Y$. With a non-linear change in $X$ (e.g., to remove skew) a model that was properly specified before will become misspecified. Non-linear transformations of $X$ are often used to linearize the relationship between $X$ and $Y$, to make the relationship more interpretable, or to address a different theoretical question.

For more on how non-linear transformations can change the model and the questions the model answers (with an emphasis on the log transformation), it may help you to read these excellent CV threads:

Linear transformations can change the values of your parameters, but do not affect the functional relationship. For example, if you center both $X$ and $Y$ before running the regression, the intercept, $\hat \beta_0$, will become $0$. Likewise, if you divide $X$ by a constant (say to change from centimeters to meters) the slope will be multiplied by that constant (e.g., $\hat \beta_{1{\rm\ (m)}} = 100 \times \hat \beta_{1{\rm\ (cm)}}$, that is $Y$ will rise 100 times as much over 1 meter as it will over 1 cm).


On the other hand, non-linear transformations of $Y$ will affect the distribution of the residuals. In fact, transforming $Y$ is a common suggestion for normalizing residuals. Whether such a transformation would make them more or less normal depends on the initial distribution of the residuals (not the initial distribution of $Y$) and the transformation used. A common strategy is to optimize over the parameter $\lambda$ of the Box-Cox family of distributions. A word of caution is appropriate here: non-linear transformations of $Y$ can make your model misspecified just as non-linear transformations of $X$ can.


Now, what if both $X$ and $Y$ are normal? In fact, that doesn't even guarantee that the joint distribution will be bivariate normal (see @cardinal's excellent answer here: Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian).

Of course, those do seem like rather strange possibilities, so what if the marginal distributions appear normal and the joint distribution also appears bivariate normal, does this necessitate that the residuals are normally distributed as well? As I tried to show in my answer I linked to above, if the residuals are normally distributed, the normality of $Y$ depends on the distribution of $X$. However it is not true that the normality of the residuals is driven by the normality of the marginals. Consider this simple example (coded with R):

set.seed(9959)              # this makes the example exactly reproducible
x = rnorm(100)              # x is drawn from a normal population
y = 7 + 0.6*x + runif(100)  # the residuals are drawn from a uniform population

mod = lm(y~x)
summary(mod)
# Call:
# lm(formula = y ~ x)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -0.4908 -0.2250 -0.0292  0.2539  0.5303 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  7.48327    0.02980   251.1   <2e-16 ***
# x            0.62081    0.02971    20.9   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.2974 on 98 degrees of freedom
# Multiple R-squared:  0.8167,  Adjusted R-squared:  0.8148 
# F-statistic: 436.7 on 1 and 98 DF,  p-value: < 2.2e-16

enter image description here

In the plots, we see that both marginals appear reasonably normal, and the joint distribution looks reasonably bivariate normal. Nonetheless, the uniformity of the residuals shows up in their qq-plot; both tails drop off too quickly relative to a normal distribution (as indeed they must).