Solved – LOESS smoothing fit

loesssmoothing

Here are 3 questions about the LOESS smoothing fit.

# Loess model
plot(Y ~ X)
loess.model <- loess(Dataset$Y ~ Dataset$X)
loess.model
hat <- predict(loess.model)
lines(Dataset$X[order(Dataset$X)], hat[order(Dataset$X)], col="red")  


Number of Observations: 52
Equivalent Number of Parameters: 4.62
Residual Standard Error: 0.9877

Is it possible to get the R-square of the loess fit (red curve)? If yes, how can I get the R-square? Could you give me the R code?
It is impossible to get the equation of the smoothing fit (red curve). Therefore we can not do any prediction with a smoothing fit LOESS? On the contrary we can easily do/make prediction with the linear line.
What is the main purpose to use a smoothing fit instead of a linear line? Is the smoothing fit more accurate, although we can not do any prediction?

Best Answer

There is a function in the TeachingDemos package called loess.demo that helps with the understanding of loess models, reading the documentation and running that function a few times may help with your understanding.

1) One way to get an R-square value is to square the correlation between the original y-values and the predicted y-values at the same point (what you call hat). Of course, since there is discussion about the validity of R-square values for regular linear regression you should be cautious of any interpretation here.

2) It is not impossible to get the equation of the smooth curve (finding meaning from it is another thing). And just because you don't have the equation does not mean that you cannot make predictions, your code uses the predict function on a loess object to get the curve, you can also use predict on new x values as well.

3) Again you misunderstand, predictions are possible (just not in as simple an equation as linear regression). Loess lines can also suggest possible transformations that may make the relationship linear, loess lines give an overall feel for the relationship, a loess line with confidence interval can tell you if the curvature is real/important or if a straight line would probably fit as well and any curvature is due to chance. Loess (and other smooths) have many uses in exploratory statistics.

Related Solutions

How to Calculate Prediction Intervals for LOESS

I don't know how to do prediction bands with the original loess function but there is a function loess.sd in the msir package that does just that! Almost verbatim from the msir documentation:

library(msir)
data(cars)
# Calculates and plots a 1.96 * SD prediction band, that is,
# a 95% prediction band
l <- loess.sd(cars, nsigma = 1.96)
plot(cars, main = "loess.sd(cars)", col="red", pch=19)
lines(l$x, l$y)
lines(l$x, l$upper, lty=2)
lines(l$x, l$lower, lty=2)

enter image description here

Your second question is a bit trickier since loess.sd doesn't come with a prediction function, but you can hack it together by linearly interpolating the predicted means and SDs you get out of loess.sd (using approx). These can, in turn, be used to simulate data using a normal distribution with the predicted means and SDs:

# Simulate x data uniformly and y data acording to the loess fit
sim_x <- runif(100, min(cars[,1]), max(cars[,1]))
pred_mean <- approx(l$x, l$y, xout = sim_x)$y
pred_sd <- approx(l$x, l$sd, xout = sim_x)$y
sim_y <- rnorm(100, pred_mean, pred_sd) 

# Plots 95% prediction bands with simulated data 
plot(cars, main = "loess.sd(cars)", col="red", pch=19)
points(sim_x, sim_y, col="blue")
lines(l$x, l$y)
lines(l$x, l$upper, lty=2)
lines(l$x, l$lower, lty=2)

enter image description here

Residual Plots – Rules of Thumb for Partial Residual Plots as Diagnostics for Linearity

I agree with @user2974951. You have to think about how a LOWESS line is fit. Intentionally, it is very wiggly. It is extremely unlikely that it would actually be a perfectly straight line that falls on the dashed regression line. In fact, in most cases where it did, I would suspect overfitting rather than evidence of an appropriate fit. If it pretty much has to wiggle, then, the issue is does it seem to wiggle randomly around your fitted regression line, or does it seem to veer off substantially (and, you'd guess, reliably)? In your case, it doesn't seem like the latter to me.

However, I think the component + residuals plots you are using are harder to read, especially when you aren't as experienced yet. It has been known, going back to at least the 1970's with Tukey and Cleveland, that it's harder to determine if data follow a line when the line is sloped. It is much easier when the line is horizontal. As a result, I would recommend you use plots of residuals vs X, instead. That is, you would make one plot for each X variable (in your case, presumably 5 plots), with the residuals on the vertical axis and the X variable on the horizontal axis. From there, you could plot a faint horizontal line at 0, and overlay a LOWESS line, if you'd like. (Bear in mind that you would have the same issues with the wigglyness of the LOWESS fit in that case.) Then you would look for systematic deviations from the horizonal line in your data.

If you have both the standard plots at the top (i.e., including the scale location plot), and the individual residual vs. X plots, I would just ignore the residual vs. fitted plot. It has become a dominated strategy. You are better able to detect heteroscedasticity in the scale location plot, and non-linearity (more accurately, incorrect functional form) in the residual vs. X plots.

Best Answer

Related Solutions

How to Calculate Prediction Intervals for LOESS

Residual Plots – Rules of Thumb for Partial Residual Plots as Diagnostics for Linearity

Related Question