[GIS] Creating a raster of the residuals of a regression between two rasters

grassrraster

I am trying to create a raster of the residuals of a regression between two rasters. i.e. I would like to carry out a regression of one raster against another of the same extent and plot the residual for each cell of the raster at the same resolution and extent as the original rasters.

It seems the GRASS 7.0 has a function called r.regression.multi which would let me carry this out but I have zero experience in GRASS and I can't even get GRASS 7.0 to work (GRASS 6.4 works fine but doesn't have r.regression.multi so is useless!)

I currently have the files in almost every format possible (Idrisi raster, TIFF, ASCII .grd in R) any help would be much appreciated!

Best Answer

Summary

Using R, once you have read in or created rasters x and y, the regression is performed and a raster of residuals r can be produced with three commands:

model <- lm(getValues(y) ~ getValues(x), na.action=na.exclude)
r <- y
r[] <- residuals.lm(model)

(To verify everything is working correctly, printing and plotting the results is a good idea.) The workflow generalizes readily to multiple independent rasters and to other models such as generalized linear models, nonlinear fits, models with spatial correlation, etc.

Worked example

Let's create some sample data:

#
# Create raster objects.
#
nRows <- 7
nCols <- 10
x <- raster(nrow=nRows, ncol=nCols)
y <- x
e <- x
#
# Simulate a bivariate relation y ~ x with error e.
#
set.seed(17)
e[] <- rnorm(nRows*nCols)
x[] <- 1:nCols
y <- x * 0.5 + 5.0 + e

Thus, the independent raster is x and the dependent is y; the errors in the linear relation are e. (In practice the errors are never known--they have to be estimated by the residuals--but with a simulated dataset we can hold on to the error values for checking things later.)

Do the regression:

model <- lm(getValues(y) ~ getValues(x), na.action=na.exclude)
summary(model)

The output lets us check that everything is working as expected. Here's part of it:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.92223    0.25842   19.05   <2e-16 ***
getValues(x)  0.53734    0.04165   12.90   <2e-16 ***
...
Residual standard error: 1.001 on 68 degrees of freedom

The estimates are highly significant and quite close to the actual values (4.92 vs. 5.00 and 0.54 vs. 0.05). The residual standard error of 1.001 closely estimates the error built into the simulation, 1.0. Finally, 68 = 7*10 - 2 counts the cells minus the two estimated parameters. Everything looks good.

Let's create your grid of residuals and plot it:

r <- y # Create a grid object for the residuals
r[] <- residuals.lm(model)
plot(r)

Residual plot

As a double-check (because we have been copying values out of and then back into raster format), let's plot the residuals against the true errors. The points should closely follow the line y=x, with just a tiny bit of scatter:

plot(getValues(e), getValues(r), xlab="Error grid", ylab="Residual grid")

Scatterplot

Everything is fine.