[GIS] Rasters in R – merge taking mean, excluding NAs. Overlay and Mosaic both giving funny results

mosaicoverlayqgisrraster

I have a series of raster images, all derived from LandSat data, which give pixels a value of 1 if they are predicted to contain water, 0 otherwise. Many of these images are from the LandSat 7 ETM+ SLC-off, which means they contain bands of missing data (NAs). Note the exact position of those bands varies from image to image. What I want to do is, for each pixel, take the mean value, simply excluding NAs.

I have tried the following (where water_rasters is a list containing the individual images described above):

water_mean <- do.call(overlay, c(water_rasters, fun = mean, na.rm = T)

and

water_mean <- do.call(mosaic, c(water_rasters, fun = mean, na.rm = T)

When using the overlay function, it appears that all pixels where any of my input layers were NA are NA in the output, so the water_mean raster is almost entirely NAs.

When using the mosaic function, I'm not quite sure what it does but the final image is certainly not the mean – almost every pixel is either 0 or 1, with no intermediate values (where I would expect quite a few intermediate values, and I have even checked individual pixels from the inputs and some should definitely be intermediate). However, it does seem to cover the whole scene, and does not have bands of NA data. As I said, I can't tell exactly what mosaic is doing – the water_mean raster certainly is generated from the input data (I can make out rivers, lakes etc) but it's not the mean of all of them or even the mean of a subset of them.

Thoughts? Workarounds? (Would be happy to use QGIS instead, but want the solution to be easily scaled up and repeatable.)

EDIT
As per request, simple example:

raster1 <- raster(matrix(c(1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1),5,5))
raster2 <- raster(matrix(c(NA,NA,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1),5,5))
raster3 <- raster(matrix(c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0),5,5))
raster4 <- raster(matrix(c(1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1),5,5))
raster5 <- raster(matrix(c(1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1),5,5))

extent(raster1) <- c(36,37,-3,-2)
extent(raster2) <- c(36,37,-3,-2)
extent(raster3) <- c(36,37,-3,-2)
extent(raster4) <- c(36,37,-3,-2)
extent(raster5) <- c(36,37,-3,-2)

projection(raster1) <- CRS("+proj=longlat +datum=WGS84")
projection(raster2) <- CRS("+proj=longlat +datum=WGS84")
projection(raster3) <- CRS("+proj=longlat +datum=WGS84")
projection(raster4) <- CRS("+proj=longlat +datum=WGS84")
projection(raster5) <- CRS("+proj=longlat +datum=WGS84")


water_rasters<-list(raster1, raster2, raster3, raster4,raster5)

water_mean_overlay <- do.call(overlay, c(water_rasters, fun = mean, na.rm = T))

water_mean_mosaic <- do.call(mosaic, c(water_rasters, fun = mean, na.rm = T))                      

In this example, water_mean_mosaic IS the desired output raster, where as overlay doesn't work (has NAs in the two top-left cells, where raster2 is NA).

So I guess the problem isn't my code per se… Could the problem with mosaic be due to large numbers of large files? I'm talking about ~100 LandSat images, each 7011×7611 pixels…

Best Answer

I think the problem with:

water_mean_overlay <- do.call(overlay, c(water_rasters, fun = mean, na.rm = T))

is that na.rm=T is being passed to overlay, and the help for overlay doesn't show an na.rm argument, so it is probably getting gobbled into the ... argument and then I'm not sure what's happening to it. Maybe it gets treated as another raster of all "TRUE" values? Or it just gets lost. Anyway...

The fix is to write a function that wraps mean but passes na.rm=TRUE:

> mean_narm = function(x,...){mean(x,na.rm=TRUE)}
> water_mean_overlay <- do.call(overlay, c(water_rasters, fun = mean_narm))
> plot(water_mean_overlay,col=rainbow(4))

but beware that mean_narm(c(NA,NA)) is NaN, and that might not be expected, and you might want to set all NaN back to NA in your results raster.