QQ-Plot – Understanding the Y-Axis on a QQ Plot

qq-plot

I have a simple question about QQ plots in simple linear regression, but I am a bit confused about the plot: depending where i look the y axis is different: it can be either 1) residuals 2) standardized residuals 3) the actual independent variable. I wonder which one is better to use? I know that we are checking the assumption of normality: that errors are normally distributed (so residuals ). Does it imply that my independent variable is also normally distributed? and what should be a proper y-axis?

I got a response in one of the chat: They are all identical for the purpose of normality checking. If the residual, epsilon, is normally distributed then Y = XB + epsilon is normally distributed. …. But the technical condition necessary for inference (p-values) is that the residuals are normally distributed. This is called called the conditional distribution of Y, conditioned on all the X's.

I am not exactly sure I understand the explanation (except the part about residuals and standardized residuals)

Best Answer

Both styles of Q-Q plots are considered correct and both are in common use. The default in R is to put the 'data' or 'sample' quantiles on the vertical axis; parameter datax=T puts the theoretical quantiles on the vertical axis. . If you're showing the Q-Q plot along with a histogram or an ECDF plot, the latter style seems natural.

set.seed(2022)
x = rnorm(100, 100, 15)
par(mfrow=c(1,3))
 hist(x, prob=T, col="skyblue")
  curve(dnorm(x, 100, 15), add=T, col="blue", lwd=2)
 plot(ecdf(x))
  curve(pnorm(x, 100, 15), add=T, col="blue", lwd=2)
 qqnorm(x, datax=T)
  qqline(x, datax=T, col="blue", lwd=2)
 par(mfrow=c(1,1))

However, if you want to draw a line around which points of a normal sample might fall, that's easier to do with the default style of Q-Q plot: $y=μ+σx.$ The default reference line in R (with qqline) connects sample and theoretical quartiles.

This default method in R seems to be used more often outside of North America. However, you should feel free to use whichever style of Q-Q plot you prefer. (The only exception might be if you're submitting to a journal that insists that one of the styles must always be used.)

In the figure below, (left) the line $y=\mu + \sigma x = 100 + 15x$ is shown in brown. The blue reference line (right) connects quartiles (red).

par(mfrow=c(1,2))
 qqnorm(x)
  abline(a = 100, b = 15, col="brown")
 qqnorm(x)
  qqline(x, col="blue")
   abline(h = quantile(x, c(.25,.75)), col="red")
   abline(v = qnorm(c(.25,.75)), col="red")
par(mfrow=c(1,1))

Addendum: Mentioned Shapiro-Wilk test in Comment. Results of S-W test for sample x used above are shown below. Sample would be considered consistent with sampling from some normal population because the P-value of the test exceeds 5%.

shapiro.test(x)

        Shapiro-Wilk normality test

data:  x
W = 0.99017, p-value = 0.678

Best Answer

Related Solutions

Solved – residual vs. QQ-plot in multiple regression

Regression – Choosing the Right Statistical Test for Independent Groups with One Dependent Variable

Related Question