Solved – Interpreting Q-Q plot and transforming data

qq-plotr

I'm running 'plot(lux.leaf.model)' to check my assumptions of an lm, and to me the Q-Q plot looked off (1st photo). I know it is subjective visualizing the distribution and could use an expert opinion on what to do here. I have a couple questions:

1) is my original Q-Q plot bad enough to necessitate transforming my data? (it has that S pattern, maybe bimodal?)
2) is a log transformation of the y variable suitable?

lm.model <- lm(leaf_lux ~ session + surface, data)

lm.model <- lm(log(leaf_lux) ~ session + surface, data)

After transforming (2nd photo), the Q-Q plot still doesn't look great (but maybe good enough? Or, maybe the original was good enough?). I've done quite a bit of googling and can't find anything that clearly answers my question. Thank you.

Best Answer

Disclaimer: You mentioned "expert opinion". I wouldn't consider myself an expert, but I do have some experience on working with linear transformations and data normalization vis a vis QQ-plots.

Regarding question 1 on whether it's bad enough (or rather good enough), it depends on your final objective. For your case, it looks like your efforts to correct your QQ-plot hasn't helped your objective. So I suppose the answer to question 1 is, yes the original QQ-plot is good enough, because your transformation hasn't brought you closer to your actual goal. At least that's my opinion, I could be mistaken on understanding your objective.

As a side note, the first QQ-plot looks one-sided to me, not S-shaped.

Regarding question 2, the log transformation has made the QQ-plot slightly better. I can't think of a clear-cut answer to what the best approach may be. So the answer to this question would be that it's probably okay, but it's hard to tell.

Related Solutions

Solved – Interpreting QQ plot of poisson regression

The line does not correspond to zeros. The Poisson distribution is for counts, which cannot go below $0$. You can see that there are points below the line. Instead, it is drawn through the middle of the distribution to give you a visual point (er, line) of reference. There are various algorithms for drawing the line, a common one is to draw a line connecting the first and third quartiles. I can't tell if that's what was done in your case.

At any rate, the qq-plot is constructed to help you assess if the residuals are normally distributed. But for a Poisson regression that doesn't make a lot of sense. So, I would probably ignore that plot.

Plotting to understand your model, and to check your assumptions is a very good thing to do, though. You can find some good ideas here: Diagnostic plots for count regression.

Solved – QQ plot and $x = y$ line

Due to the lack of data in your question, I use the gaussian distribution vs. a sample in my answer below (instead of Laplace distribution vs. your sample data).

As far as the two first moments are concerned, the interpretation of what you see in the qq-plot is the following:

If the distributions are identical, you expect a line $x = y$:
```
x <- rnorm(1000)
qqnorm(x)
abline(0, 1, col = 'red')
```

If the means are different, you expect an intercept $a \neq 0$, meaning it will above or bellow the $x=y$ line:
```
x <- rnorm(1000)
qqnorm(x + 1)
abline(0, 1, col = 'red')
```

If the standard deviations are different, you expect a slope $b \neq 1$:
```
x <- rnorm(1000)
qqnorm(x * 1.5)
abline(0, 1, col = 'red')
```

To get the intuition of this, you can simply plot the CDFs in the same plot. For example, taking the last one:

lines(seq(-7, 7, by = 0.01), pnorm(seq(-7, 7, by = 0.01)), col = 'red')

Let's take for example 3 points in the y-axis: $CDF(q) = 0.2$, $0.5$, $0.8$ and see what value of $q$ (quantile) gives us each CDF value.

You can see that:

$$\begin{aligned} F^{-1}_{red}(0.2) &> F^{-1}_X(0.2) \text{ (quantile around -1)} \\ F^{-1}_{red}(0.5) &= F^{-1}_X(0.5) \text{ (quantile = 0)}\\ F^{-1}_{red}(0.8) &< F^{-1}_X(0.8) \text{ (quantile around 1)} \end{aligned}$$

Which is what's shown by the qq-plot.

Best Answer

Related Solutions

Solved – Interpreting QQ plot of poisson regression

Solved – QQ plot and $x = y$ line

Related Question