I have a simple question about QQ plots in simple linear regression, but I am a bit confused about the plot: depending where i look the y axis is different: it can be either 1) residuals 2) standardized residuals 3) the actual independent variable. I wonder which one is better to use? I know that we are checking the assumption of normality: that errors are normally distributed (so residuals ). Does it imply that my independent variable is also normally distributed? and what should be a proper y-axis?
I got a response in one of the chat: They are all identical for the purpose of normality checking. If the residual, epsilon, is normally distributed then Y = XB + epsilon is normally distributed. …. But the technical condition necessary for inference (p-values) is that the residuals are normally distributed. This is called called the conditional distribution of Y, conditioned on all the X's.
I am not exactly sure I understand the explanation (except the part about residuals and standardized residuals)
Best Answer
Both styles of Q-Q plots are considered correct and both are in common use. The default in R is to put the 'data' or 'sample' quantiles on the vertical axis; parameter
datax=T
puts the theoretical quantiles on the vertical axis. . If you're showing the Q-Q plot along with a histogram or an ECDF plot, the latter style seems natural.However, if you want to draw a line around which points of a normal sample might fall, that's easier to do with the default style of Q-Q plot: $y=μ+σx.$ The default reference line in R (with
qqline
) connects sample and theoretical quartiles.This default method in R seems to be used more often outside of North America. However, you should feel free to use whichever style of Q-Q plot you prefer. (The only exception might be if you're submitting to a journal that insists that one of the styles must always be used.)
In the figure below, (left) the line $y=\mu + \sigma x = 100 + 15x$ is shown in brown. The blue reference line (right) connects quartiles (red).
Addendum: Mentioned Shapiro-Wilk test in Comment. Results of S-W test for sample
x
used above are shown below. Sample would be considered consistent with sampling from some normal population because the P-value of the test exceeds 5%.