R – Getting Masked Raster with No NAs and Summing Pixels Within Overlaying Polygon

missing datapolygonrrasterzonal statistics

I have done an IDW interpolation in R for precipitation data following the routine of Manny Gimond (https://mgimond.github.io/Spatial/interpolation-in-r.html) to get precipitation inside a catchment.

Here is the routine:

library(gstat) #Use gstat's idw routine
library(sp)    #Used for the spsample function

#Create an empty grid where n is the total number of cells
grd <- as.data.frame(spsample(P, "regular", n=50000))
names(grd) <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd) <- TRUE  # Create SpatialPixel object
fullgrid(grd) <- TRUE  # Create SpatialGrid object

# Add P's projection information to the empty grid
proj4string(grd) <- proj4string(P)

# Interpolate the grid cells using a power value of 2 (idp=2.0)
P.idw <- gstat::idw(Precip_in ~ 1, P, newdata=grd, idp=2.0)

# Convert to raster object then clip to Texas
r <- raster(P.idw)
r.m <- mask(r, W)

The problem is that in the last step (i.e masking the raster with interpolated values (r) to the boundaries of the geometry of interest) a great number of NA values appear that correspond to point lying out of the masking polygon.

What I am looking for is a function to use in conjunction with mask that ensures that in the final masked raster these NA values are removed so that I have raster with no NA value.

Best Answer

What I am looking for is a function to use in conjunction with mask that ensures that in the final masked raster these NA values are removed so that I have raster with no NA value.

A raster is an NxM grid where every cell has a value. If the value is not known, it is generally set to NA (or some special value like "-9999"). There's no way you can "remove" a value from a cell without replacing it with something else.

Related Solutions

[GIS] Interpolating raster to get finer resolution

If you want to merely, "Like, if there were some way to break down the pixels into small areas, but the pixels still having the same values, it also would work." Then merely change the cell size to the new number (say from 9 to 3). You can do this in QGIS via the raster calculator or gdal warp (How to resample GeoTIFF images to the same resolution?).

Now if you want to decrease the cell size you could certainly convert cells to points and then you could use kriging if required or items such as spline, weighted nearest neighbor, and IDW, but I think a simple TIN would suffice and then convert the TIN to raster. You may even be able to create the TIN direct from the raster.

This may a suitable case for Tobler's pycnophylactic interpolation in raster. This mass preserving analysis will assign values to your new cells based on their neighbors. I have only ever done this in ArcGIS but it can be run GRASS 7 using v.surf.mass. You cells would break into say 9 cells per cell and then use pycnophylactic interpolation at this level.

I would suggest you start with a TIN.

Changing Raster Resolution Resizing image resolution from 1000 m to 250 m in ArcGIS Desktop?

How to create a TIN in QGIS https://docs.qgis.org/2.6/en/docs/user_manual/plugins/plugins_interpolation.html

R Raster – Generating Prediction Raster from Random Forest Model

Just a quick note on this "problem". When you read in a raster, be it a single raster on in a stack/brick, the default names are the names of the on-disk files. In using the raster::predict function the names in the model object must match the names in the stack/brick. As such, it is good convention the assign the names that you want to use across your modeling workflow. This also provides an addition advantage in easing data management.

Let's say you have a naming convention in your raster layers that correspond to your covariates. You can define a vector of covariate names and then use the vector to read the data with very efficient code.

dummy covariate/raster names

covariates <- paste0(rep("v", 10), 1:10)

Create a vector of rasters (tif) in specified directory. If different from your working directory you can use the full.names = TRUE argument in list.files.

rlist <- list.files(getwd(), "tif$")

Then you can use grep to query the vector of rasters to match your covariate names, and since you already have a vector of names you can then assign it to the stack object. The grep function returns an index, thus the brackets, of the query. Using paste with collapse allows you to pass multiple values to grep, based on the covariates vector.

vars <- stack(rlist[grep(paste(covariates, collapse = "|"), rlist)])
  names(vars) <- covariates

Now, the names issue is solved for the raster::predict function. We should address calling the function itself. It is important to keep in mind that raster::predict is wrapper for other predict functions that each have their own data structures. The example at hand would be the predict method for randomForest:::predict.randomForest. In a classification model, if type="prob" or "votes" a data.frame is returned, with n columns, representing each class. You will notice that raster::predict has some arguments that can control output. The fun argument lets you pass a custom predict function, superseding any existing predict method for the model object. The index argument lets you define the column of a multi-column data.frame or matrix that is returned from a given predict method. With randomForest probability predictions a column is returned for each class so, you have to define with column you want using index. For a binomial model, for returning the prevalence class ["1"] you would use index=2.

raster::predict(model=rf1, object=ApPl_stack, type="prob", index=2)

I would also note, based on the OP's code, that you want to avoid symbolic (formula) model calls if an index interface is possible. For some reason symbolic calls really slow down predictions such as this, specifically in randomForest. Here is what an index call looks like for randomForest.

rf1 <- randomForest(y=factor(dcc.s.dummydcc.s.dummy[,"SITE_NONSITE"]), 
                   x=dcc.s.dummy[,-which(names(dcc.s.dummy)=="SITE_NONSITE")])

Or, if you know the positions of your covariates, simply.

rf1 <- randomForest(y=factor(dcc.s.dummy[,"SITE_NONSITE"]),
                    x=dcc.s.dummy[,2:ncol(dcc.s.dummy)])

For this model, I would also highly recommend addressing model fit through parameter selection. Elsewise, you are fitting random variation in your models and this is reflected in the spatial estimates. Parsimony is actually an important factor in spatial estimates using nonparametric methods. You can address model/parameter selection using the rfUtilites::rf.modelSel as well as addressing multivariate multicollinearity issues and evaluate model fit/performance through a Bootstrap approach.

Best Answer

Related Solutions

[GIS] Interpolating raster to get finer resolution

R Raster – Generating Prediction Raster from Random Forest Model

Related Question