Solved – Help interpreting Residuals vs Fitted Plots

data visualizationresiduals

I'm analysing some data which compares treatment effects. My anovas have all indicated that there is no significant differences between treatments.

I have also applied a transformation and re-fitted the anovas by applying a square root to the dependent variable.

I can understand that the Normal Q-Q plot has improved and also that in the residuals vs fitted plot, the line is straighter, but I don't understand the clustering effect. Is the residuals vs fitted plot on the bottom the better one?

Residuals vs Fitted, Normal Q-Q Plots

Best Answer

What precisely is the difference between top and bottom rows? I guess that the bottom row is after the square root transformation. If so, then residual and fitted are on different scales in the two rows.

What's most obviously missing here are plots of observed values versus the factor(s) in your model. I can't work out how many there might be. What would help also would be (1) showing the data if possible (2) giving precise model statement (R syntax would probably be transparent enough here, or other syntax if it's not R).

One criterion of a good model (and very far from the only such) is that the residuals appear to lack structure. It's more important that the fitted part appears to show the important structure; the two often go together but are not quite implied by each other.

So, the marked grouping bottom left is certainly a puzzle, but I don't see that any of the information you give us allows to venture an explanation, unless it's a side effect of the transformation e.g. that zeros in the data and small values are being pulled apart. But you have a handle in so far as you can group your points, say into left and right groups, and then see how that grouping is echoed on other plots. A separation rule is say fitted = -0.04. (A scatter plot of square root observed versus observed may seem trivial but would bring home exactly what the transformation is doing.)

In the bottom row, the range of your fitted values is about 0.04 and the range of your residuals is about 0.35, almost an order of magnitude higher. That is consistent with the statement of small treatment effects and implies that the grouping is more subtle than it appears here. In practice, however, there is usually a real story behind a grouping as distinct as you have.