[GIS] R kriging cross validation returns NA for all prediction points

cross-validationinterpolationkrigingmissing datar

I'm trying to cross-validate the results of an [auto]Krige output but for some unknown reason the results of both autoKrige.cv (automap package) and krige.cv (gstat package) return NA for all values (e.g. var1.pred) other than the observed, yet, autoKrige works fine generating prediction values so I'm struggling to work out what the issue is.

The code I'm running is based on this code for the example meuse dataset:

data(meuse)
coordinates(meuse) = ~x+y
proj4string(meuse) <- CRS("+init=epsg:28992")    
meuse.krg <-autoKrige (zinc~1,meuse)    
meuse.cv <- autoKrige.cv (zinc~1,meuse)

This works fine.

And now with my data (these data can be found here: https://www.dropbox.com/sh/4lu25u9eun02vu6/AABfmp0ciUMWEV03hKgIVbQ_a?dl=0)

#load data from Hollicombe S1 shapefile    
hs1<- readOGR (".", "Hollicombe_S1_L1-5_A1.2")

#transforms to appropriate UTM projection    
hs1 <- spTransform(hs1.data,"+proj=utm +ellps=WGS84 +datum=WGS84 +zone=30") 

#autokrige working fine
hs1.krg <- autoKrige (MeanBioHei~1,hs1)

summary(hs1.krg)

krige_output:
Object of class SpatialPixelsDataFrame
Coordinates:
         min       max
x1  461034.8  461496.4
x2 5588235.5 5588991.2
Is projected: TRUE 
proj4string :
[+proj=utm +ellps=WGS84 +datum=WGS84 +zone=30 +towgs84=0,0,0]
Number of points: 4998
Grid attributes:
   cellcentre.offset cellsize cells.dim
x1          461034.8 4.084753       114
x2         5588235.5 4.084753       186
Data attributes:
   var1.pred          var1.var       var1.stdev   
 Min.   : -1.252   Min.   :635.4   Min.   :25.21  
 1st Qu.: 41.097   1st Qu.:662.6   1st Qu.:25.74  
 Median : 65.191   Median :676.6   Median :26.01  
 Mean   : 59.831   Mean   :682.6   Mean   :26.12  
 3rd Qu.: 81.756   3rd Qu.:689.3   3rd Qu.:26.25  
 Max.   :102.207   Max.   :985.3   Max.   :31.39  

#but then when trying to use autoKrige.cv    
hs1.cv <- autoKrige.cv  (MeanBioHei~1,hs1)

print(hs1.cv)
$krige.cv_output
            coordinates var1.pred var1.var observed residual zscore fold
1   (461031.4, 5588291)        NA       NA    0.182       NA     NA    1
2   (461043.2, 5588294)        NA       NA    0.197       NA     NA    2
3   (461043.3, 5588309)        NA       NA    0.222       NA     NA    3
4   (461055.2, 5588309)        NA       NA    0.163       NA     NA    4
5   (461055.3, 5588328)        NA       NA    0.240       NA     NA    5
6   (461055.3, 5588328)        NA       NA    0.196       NA     NA    6
7   (461067.1, 5588328)        NA       NA    0.204       NA     NA    7
8   (461067.1, 5588328)        NA       NA    0.228       NA     NA    8
9   (461067.3, 5588346)        NA       NA    0.224       NA     NA    9
10  (461067.3, 5588346)        NA       NA    0.197       NA     NA   10
11  (461069.7, 5588365)        NA       NA    0.237       NA     NA   11
12  (461079.2, 5588364)        NA       NA    0.236       NA     NA   12
13  (461079.4, 5588383)        NA       NA    0.194       NA     NA   13
14  (461079.4, 5588383)        NA       NA    0.222       NA     NA   14
15  (461091.2, 5588383)        NA       NA    0.198       NA     NA   15
16  (461091.3, 5588401)        NA       NA    0.000       NA     NA   16
17  (461091.3, 5588401)        NA       NA    0.197       NA     NA   17
18  (461103.2, 5588401)        NA       NA    0.211       NA     NA   18
19  (461103.3, 5588420)        NA       NA    0.156       NA     NA   19
20  (461103.3, 5588420)        NA       NA    0.190       NA     NA   20
21  (461115.1, 5588420)        NA       NA    0.186       NA     NA   21
22  (461115.1, 5588420)        NA       NA    0.193       NA     NA   22
23  (461115.3, 5588438)        NA       NA    0.178       NA     NA   23
24  (461115.3, 5588438)        NA       NA    0.210       NA     NA   24
25  (461127.1, 5588438)        NA       NA    0.152       NA     NA   25
26  (461127.3, 5588457)        NA       NA    0.194       NA     NA   26
27  (461127.3, 5588457)        NA       NA    0.203       NA     NA   27
28  (461139.2, 5588475)        NA       NA    0.000       NA     NA   28
29  (461139.2, 5588475)        NA       NA    0.224       NA     NA   29
30  (461140.7, 5588494)        NA       NA    0.192       NA     NA   30
[continues...]

I've tried using different variables of the data (hs1) and the error persists.

Note in the above examples I've not entered an argument for new_data (a prediction grid) but the error still persists when I use a grid.

Does anyone have any ideas what the problem may be?

Best Answer

The usual problem with kriging is duplicate locations. You can note that even the function autoKrige gives you the following warning:

Removed 197 duplicate observation(s) in input_data

You can also compare length(hs1@coords) and length(unique(hs1@coords)), which gives you results 1104 and 730 respectively.

If you use for example the following construct to remove the duplicated points, the cross-validation will run fine:

hs1 = hs1[which(!duplicated(hs1@coords)), ]
hs1.cv <- autoKrige.cv(MeanBioHei~1,hs1)

giving the output:

$krige.cv_output
        coordinates      var1.pred    var1.var    observed    residual     zscore   fold
1   (461031.4, 5588291)  0.212272155 0.005044351    0.182 -0.0302721554 -0.426226727    1
2   (461043.2, 5588294)  0.218947606 0.004810546    0.197 -0.0219476055 -0.316438954    2
3   (461043.3, 5588309)  0.223431394 0.004911102    0.222 -0.0014313944 -0.020425366    3
4   (461055.2, 5588309)  0.234672630 0.004750672    0.163 -0.0716726298 -1.039862237    4
5   (461055.3, 5588328)  0.228019987 0.004877474    0.240  0.0119800132  0.171537799    5

etc.

Load libraries and example data:

# Load libraries
library('raster')
library('rgdal')

# Load a SpatialPolygonsDataFrame example
# Load Brazil administrative level 2 shapefile
BRA_adm2 <- raster::getData(country = "BRA", level = 2)

# Convert NAMES level 2 to factor 
BRA_adm2$NAME_2 <- as.factor(BRA_adm2$NAME_2)

# Plot BRA_adm2
plot(BRA_adm2)
box()

# Define RasterLayer object
r.raster <- raster()

# Define raster extent
extent(r.raster) <- extent(BRA_adm2)

# Define pixel size
res(r.raster) <- 0.1

Figure 1: Brazil SpatialPolygonsDataFrame plot

Simple thread example

# Simple thread -----------------------------------------------------------

# Rasterize
system.time(BRA_adm2.r <- rasterize(BRA_adm2, r.raster, 'NAME_2'))

Time in my laptop:

# Output:
# user  system elapsed 
# 23.883    0.010   23.891

Multithread thread example

# Multithread -------------------------------------------------------------

# Load 'parallel' package for support Parallel computation in R
library('parallel')

# Calculate the number of cores
no_cores <- detectCores() - 1

# Number of polygons features in SPDF
features <- 1:nrow(BRA_adm2[,])

# Split features in n parts
n <- 50
parts <- split(features, cut(features, n))

# Initiate cluster (after loading all the necessary object to R environment: BRA_adm2, parts, r.raster, n)
cl <- makeCluster(no_cores, type = "FORK")
print(cl)

# Parallelize rasterize function
system.time(rParts <- parLapply(cl = cl, X = 1:n, fun = function(x) rasterize(BRA_adm2[parts[[x]],], r.raster, 'NAME_2')))

# Finish
stopCluster(cl)

# Merge all raster parts
rMerge <- do.call(merge, rParts)

# Plot raster
plot(rMerge)

Figure 2: Brazil Raster plot

Time in my laptop:

# Output:
# user  system elapsed 
# 0.203   0.033   8.688

More info about parallelization in R:

Best Answer

Related Solutions

[GIS] How to read *.adf files into R

Process Vector to Raster Faster with R – Speed Optimization Techniques

Load libraries and example data:

Simple thread example

Multithread thread example

Related Question