[GIS] Rf-Classification seems to work but gives an error 15 seconds later

classificationimageryrandom forest

This is my code.
It classifies an imagery-stack(xvars) and reads a shapefile with training-points.

If I run single rows they work. Even if I run them all, they work and start to predict but after 15s it stops working and gives an error.

As far as I understand, the classification doesn't need the raster-names because they are read automatically.

My training-shapefile (sw_trainshape.shp) inherits a table with columns

FID    Shape    Class      ObjectID       x              y
 1     Point     bush         1      481791,2429   5626286,6397
 2      ...       ...        ...            ...

My tif-files are named band1,band2,band3 and so on. I have 6 bands.

Yet I'm not experienced enough, so I could use some help as to why my code doesn't classifies.

ERROR:

"Loading required package: tcltk

Error in predict.randomForest(model, blockvals, …) :
variables in the training data missing in newdata"

Code:

setwd("D:/BA-Workspace/DOP_10/orthophotos_abcd/R/test_run_R/test_other")


library(sp)
library(rgdal)
library(raster)
library(randomForest)

# create list of rasters
rlist=list.files(getwd(), pattern="tif$", full.names=TRUE) 

# CREATE RASTER STACK
xvars <- stack(rlist)      

# READ Raster TRAINING DATA
sdata <- readOGR(dsn=getwd(), layer="sw_trainshape")

# ASSIGN RASTER VALUES TO TRAINING DATA
v <- as.data.frame(extract(xvars, sdata))
sdata@data = data.frame(sdata@data, v[match(rownames(sdata@data), rownames(v)),])

# RUN RF MODEL
rf.mdl <- randomForest(x=sdata@data[,3:ncol(sdata@data)],     y=as.factor(sdata@data[,"Class"]),
                   ntree=501, importance=TRUE)

# CHECK ERROR CONVERGENCE
#plot(rf.mdl)

# PLOT mean decrease in accuracy VARIABLE IMPORTANCE
#varImpPlot(rf.mdl, type=1)

# PREDICT MODEL
predict(xvars, rf.mdl, filename="RfClassPred.img", type="response", 
    index=1, na.rm=TRUE, progress="window", overwrite=TRUE)

added sdata@data

Console:

> setwd("D:/BA-Workspace/DOP_10/orthophotos_abcd/R/test_run_R/test_other")
> 
> 
> library(sp)
> library(rgdal)
> library(raster)
> library(randomForest)
> 
> 
> # CREATE LIST OF RASTERS
> rlist=list.files(getwd(), pattern="tif$", full.names=TRUE) 
> 
> # CREATE RASTER STACK
> xvars <- stack(rlist)      
> 
> # READ Raster TRAINING DATA
> sdata <- readOGR(dsn=getwd(), layer="sw_trainshape")
OGR data source with driver: ESRI Shapefile 
Source: "D:/BA-Workspace/DOP_10/orthophotos_abcd/R/test_run_R/test_other", layer:     "sw_trainshape"
with 256 features and 10 fields
Feature type: wkbPoint with 2 dimensions
> 
> # ASSIGN RASTER VALUES TO TRAINING DATA
> v <- as.data.frame(extract(xvars, sdata))
> sdata@data = data.frame(sdata@data, v[match(rownames(sdata@data), rownames(v)),])
> 
> # RUN RF MODEL
> rf.mdl <- randomForest(x=sdata@data[,3:ncol(sdata@data)],     y=as.factor(sdata@data[,"Class"]),
+                        ntree=501, importance=TRUE)
> 
> # CHECK ERROR CONVERGENCE
> #plot(rf.mdl)
> 
> # PLOT mean decrease in accuracy VARIABLE IMPORTANCE
> #varImpPlot(rf.mdl, type=1)
> #setOldClass(SpatialPointsDataFrame)
> # PREDICT MODEL
> predict(xvars, rf.mdl, filename="RfClassPred.img", type="response", 
+         index=1, na.rm=TRUE, progress="window", overwrite=TRUE)
Error in predict.randomForest(model, blockvals, ...) : 
  variables in the training data missing in newdata

enter image description here

Solution, thanks to TimSalabim:

setwd("D:/BA-Workspace/sw_west_aug/reduced_size/")


library(sp)
library(rgdal)
library(raster)
library(randomForest)


# CREATE LIST OF RASTERS
rlist=list.files(getwd(), pattern="tif$", full.names=TRUE) 

# CREATE RASTER STACK
xvars <- stack(rlist)  

# CREATE RASTER STACK
xvars <- stack(rlist)  
x <- coordinates(xvars)[, 1]
y <- coordinates(xvars)[, 2]

x_rst <- y_rst <- xvars[[1]]
x_rst[] <- x
y_rst[] <- y

xvars <- stack(x_rst, y_rst, xvars)
names(xvars) <- c("X", "Y", "focal_1", "focal_2", "focal_3")
# READ Raster TRAINING DATA
sdata <- readOGR(dsn=getwd(), layer="training_west")

# ASSIGN RASTER VALUES TO TRAINING DATA
v <- as.data.frame(extract(xvars, sdata))
sdata@data = data.frame(sdata@data, v[match(rownames(sdata@data), rownames(v)),])

sdata@data  <- sdata@data[-c(5,6)] 

# RUN RF MODEL
rf.mdl <- randomForest(x=sdata@data[,3:ncol(sdata@data)],   y=as.factor(sdata@data[,"class"]),
                   ntree=501, importance=TRUE)

# CHECK ERROR CONVERGENCE
#plot(rf.mdl)
#sdata@data 

# PLOT mean decrease in accuracy VARIABLE IMPORTANCE
#varImpPlot(rf.mdl, type=1)
#setOldClass(SpatialPointsDataFrame)
# PREDICT MODEL
predict(xvars, rf.mdl, filename="RfClassPred.img", type="response", 
    index=1, na.rm=TRUE, progress="window", overwrite=TRUE)

Best Answer

You need to make sure that names(sdata@data[,3:ncol(sdata@data)]) and names(xvars) are exactly the same. Check this using

identical(names(sdata@data[,3:ncol(sdata@data)]), names(xvars))

If TRUE, your predict should run fine.

The edit related warnings/errors are irrelevant, they relate to you trying to display a SpatialPolygonsDataFrame (and S4 class object) as a standard data.frame in RStudio.

EDIT: It seems you have differences between your stack layer names and your sdata@data data frame. Make sure these are the same. If you would like to include x and y coordinates as layers to your stack (if this makes sense obviously depends on your objective) you could do it like this:

x <- coordinates(xvars)[, 1]
y <- coordinates(xvars)[, 2]

x_rst <- y_rst <- xvars[[1]]
x_rst[] <- x
y_rst[] <- y

Then you would need to add those to your stack at the appropriate position:

xvars <- stack(x_rst, y_rst, xvars)

Note also, that you have additional variables in your sdata@data data frame ("band1.1" etc). I don't know where these come from, maybe you are merging something earlier? Again, for predict() to work properly, the layers of your stack and the columns from your training data need to be identical (the names of these).

Related Question