[GIS] NDVI-time series with missing values

ndvirrastertime series

I have a Landsat-NDVI-time series from 2013 – 2017 with 23 observations per year (115 scenes in total). My goal is to get a smooth time series for a selected single pixel (maybe by using a Savitzky-Golay filter) but also get a trend map for all pixels in my research area.
I first calculated the missing dates, created a raster with NA values for each of these scenes, and stacked all of the files together.

Example of the output for one pixel:

    Date          NDVI
 2013-01-02       NA
 2013-01-18       NA
 2013-02-03       NA
  [...]
 2017-08-25  0.4551642537
 2017-09-10  0.3888225853
 2017-09-26  0.6013333797

Now I would like to calculate a smooth time series for this pixel. If I interpolate before applying the time series I get an irregular time series because the first year than only contains 19 observations. I tried following this tutorial, but could not get it to work.
How can I handle this problem and more importantly, how can I calculate a pixel-wise regression for the whole scene? I found this, which seems to cover my needs, but can I apply it to an irregular time series?

Best Answer

Here is an approach that imputes NA values based on a local polynomial regression (loess). The default smoothing parameter (s) is fairly stable but, if the data is stochastic (say, climate data), is also something that you may want to test.

I would also point out that, in the temporal analysis literature, it is common practice to smooth a time-series using a loess regression. I added an argument "smooth.data" that allows for the function to return not only imputed NA values but also the smoothed time-series. If smooth.data = FALSE only the NA values are replaced else all of the values are replace with the smoothed loess estimate. If the data is smoothed attention needs to be paid to the smoothing parameter (s) to ensure that the data is being adequately smoothed without also changing the properties of the original distribution. In a recent monthly 20-year LAI trend analysis I found that the default s=0.80 was adequate for imputation of NA values however, was too aggressive in smoothing the entire time-series. I had to specify s=0.30 to achieve the desired smoothing.

#### Loess impute function
# y            Vector with NA values or to be smoothed
# x.length     length of resulting vector
# s            Smoothing parameter (see loess span argument)
# smooth.data  (FALSE/TRUE) smooth all of the data
impute.loess <- function(y, x.length = NULL, s = 0.80, 
                         smooth.data = FALSE, ...) {
  if(is.null(x.length)) { x.length = length(y) }
    options(warn=-1)
    x <- 1:x.length
    p <- loess(y ~ x, span = s, data.frame(x=x, y=y))
    if(smooth.data == TRUE) {
      y <- predict(p, x)
    } else {
      na.idx <- which( is.na(y) )
        if( length(na.idx) > 1 ) {
          y[na.idx] <- predict(p, data.frame(x=na.idx))
        }
    }   
  return(y)
}

Here we apply the impute.loess function. I have had numerous problems using calc to apply this function so, I am using a for loop to iterate through the rows of the raster stack with getValues. An alternative approach is to coerce into a SpatialPixelsDataFrame and operate on the @data slot using apply, the same as below. This is more direct and does not require the back-and-forth between raster and sp. In this example, I make a copy of the raster stack and populate it with NA values. To be memory safe you could use writeStart and writeValues to create and write to a raster on disk.

ndvi.new <- ndvi
ndvi.new[] <- NA
  for( rl in 1:nrow(ndvi) ) { 
    v <- getValues(ndvi, rl, 1)
    ndvi.new[rl,] <- as.matrix(t(apply(v, MARGIN=1, FUN=impute.loess)))
  }

Since apply can take additional function arguments you can change the s and smooth.data agruments by supplying them after the loess.smooth call in apply eg.,

as.matrix(t(apply(v, MARGIN=1, FUN=impute.loess, smooth.data = TRUE, s = 0.20)))

Occasionally, this function will miss NA values on the absolute tails of the time-series because it is impossible to converge on an end value. This is a statistical limitation and can only really be solved by taking the next/previous value and assigning to the missing data. You could add an if/else statement in the function to account for this or just deal with it in your analysis.

Related Question