[GIS] How to utilise single ‘randomForest’ model for several rasters in R

rrandom forest

I am using randomForest package in R to derive species distribution models. When I predict the model to the RasterStack containing the predictors in which the values were extracted to build my model, everything goes fine. The problem is that I would like to predict this model I generated in the present scenario to the same set of predictors in the past and future. However, R gives me an error message.

Past and present predictors RasterStack are the same extension, resolution, etc. This is not the problem.

Reading the help of the 'predict' function {raster package} in R, I found the following:
"The names in the Raster object should exactly match those expected by the model. This will be the case if the same Raster object was used (via extract) to obtain the values to fit the model".

So, according to this, I can't predict my model to another RasterStack that was not used to obtain the values for the model. So, I would like to know another way that I could predict values from a present scenario to past or future scenarios.

Thanks a lot, Mariana

Best Answer

Welcome to SE-GIS! It is great that you have read the manual (as most people often forget to do it)! As it was said in the instruction - the names of the predictors must be the same that was used for training. The name of the predictors if you use 'raster' package are generated as 'raster_name'(without extension)+'.'+'band_number', e.g. 'imagery.tif' with 2 bands will generate names 'imagery.1' and 'imagery.2'. So if you want to use model generated for the one raster ('imagery.tif') to predict another ('another_imagery.tif') you need to supply raster with exact the same name that was used in the first place. Just rename the second raster (even temporary just for prediction) - that's all.

Alternatively you may do the same thing without 'raster' package at all. Training data have to be sampled separately and provided as a .csv-file. You will need to convert raster to data frame, rename its bands, make prediction and convert data frame into the raster again. The spatial reference information will be lost during transformation and you will need to recover it one way or another. I tried it once and I find it more tedious then renaming file for 'raster' package.

Here is the code that I used for myself (mostly taken from this article that is in Russian). SRS in this case was recovered externally (outside R). Here I had 5-band raster and training values as .csv-file (sampling of raster values was made in QGIS and results were saved as .csv). You may export to .csv the sampling results that were created using 'raster' package.

library(rgdal)
library(randomForest)

# import training data
train_new <- read.delim("train_new.csv")

#add class 2 (initial set contains only 0 and 1 values for $class)
train_new$class[ ((train_new$build_dist < 500) & (train_new$main_road < 300)) ] <- 2

train_new$class <- as.factor(train_new$class)
band_names <- names(train_new)

# start training
tt <- randomForest(class ~.,  data = train_new, ntree=1000, importance=T)


# import raster data
image <- "image.tif"
x <- new("GDALReadOnlyDataset", image)
width <- dim(x)[2]
height <- dim(x)[1]
imagedata <- data.frame(getRasterTable(x))
bnames <- band_names[1:5]

#rename raster bands to match names of attributes in .csv
names(imagedata) <- c("x", "y", bnames) 
imagedata <- imagedata[3:7] #delete unwanted 'x' and 'y' fields

# prediction
B <- predict(tt, imagedata, type="class")
BM <- as.matrix(B)

# export result to .tiff
tif_driver <- new("GDALDriver", "GTiff")
tif2 <- new("GDALTransientDataset", tif_driver, height, width, 1, 'Byte')
resmtx <- matrix(BM, width, height)
bnd1 <- putRasterData(tif2, resmtx)
tif_file <- "result.tif"
saveDataset(tif2, tif_file)
GDAL.close(tif2)
GDAL.close(tif_driver)