[GIS] Pixel correlations from two raster datasets in R

correlationrraster

I am having trouble calculating the pixel by pixel correlation coefficient between two datasets. My current code takes in two folders full of rasters and creates two independent raster stacks. These rasters all have the same cellsize and extent. I then try and take the correlation coefficient (spearman in my case) between the column values of those rasters.

library(raster)

r <- raster()
raster1 <- list.files(path = "data/List1", pattern = "*.tif$", full.names = T)
raster2 <- list.files(path = "data/List2", pattern = "*.tif$", full.names = T)
l1 <- stack(raster1)
l2 <- stack(raster2)

list1Values <- values(l1)
list2Values <- values(l2)
corValues <- vector(mode = 'numeric')

for (i in 1:dim(list1Values)[1]){
  corValues[i] <- cor(x = list1Values[i,], y = list2Values[i,], method = 'spearman')
}

corRaster <- setValues(r, values = corValues)

However at the correlation for loop it gives six error messages saying

In cor(x = list1Values[i, ], y = list2Values[i, ], method = "spearman") :
  the standard deviation is zero

Ignoring that and continuing on, the last line errors out saying

Error in setValues(r, values = corValues) : 
  length(values) is not equal to ncell(x), or to 1

My initial guess was that since the datasets have a lot of NA values (In this case it is a lot of open ocean that there is no data for), that could cause problems, so I added the

use = "complete.obs"

parameter to the cor function. This errors saying

Error in cor(x = list1Values[i, ], y = list2Values[i, ], method = "spearman",  : 
  no complete element pairs

My guess is that this last error is telling me the matrices it created don't line up, and no non-NA cells match. I have no clue how this is possible because I have been working with these rasters for years and they certainly line up. Other than that, I don't know why this isn't working.


This question is not a duplicate of Correlation/relationship between map layers in R? in any way. First I am not using point data, but instead raster. I am also interested in a simple pearson and/or spearman correlation coefficient NOT cross correlation nor spatial correlation. I'm also not using two layers, but two sets of hundreds of layers (making the statistical analysis of the correlation valid). Lastly, this code is doing fundamentally different things than that post and I am asking for specific help with the errors arising.

Best Answer

For reference, here is an approach, in memory, that uses mapply. This is one of the lesser known apply functions in R that lets you apply a function across objects.

Here, when I create objects from the raster stacks, I wrap the stack function call in list and values. This results in list objects, containing data.frames holding the raster values. This makes the data ready for mapply, which expects list objects. The use of sapply allows me to use a numeric row index 1:nrow(s1[[1]]) to aggregate the row-by-row correlations. This results in a vector of correlations, matching the rows in each lists data.frames. This vector is ordered to the cells in the source rasters and can be piped directly back into a source raster.

library(raster)
r <- raster(system.file("external/test.grd", package="raster"))
  s1 <- list(values( stack(r, r*0.15, r/1805.78) ))
  s2 <- list(values( stack(r^2, r/0.87*10, r/sum(values(r),na.rm=T)) ))

r[] <- as.vector(mapply(function(X,Y) { sapply(1:nrow(s1[[1]]), function(row) 
               cor(X[row,], Y[row,]))}, X=s1, Y=s2))
plot(r)   
Related Question