[GIS] Want cell linear regression values for a netCDF or multi-band raster

netcdfopen-source-gisrasterregressionsoftware-recommendations

I have climate raster type netCDF data.

I want the slope and other regression statistics of temperature for each cell over time.

I have read that GEOV IDL can do a temporal regression to achieve what I want but it is not open source. Does anyone know of a good open source software that can accomplish what I would like? R maybe?

Best Answer

Here is how you could get the slope, using R and the raster package. To (also) get the intercept see help(calc)

library(raster)
# your file
# b <- brick("file.nc")
# example data:
b <- brick(system.file("external/rlogo.grd", package="raster"))

# here time is 1 to n, but you can set it something else
time <- 1:nlayers(b)
# write a function that reruns the value or values of interest 
fun <- function(x) { lm(x ~ time)$coefficients[2] }
x <- calc(b, fun)
plot(x)

If you have NA values, you need a more complex function

fun2 <- function(x) { 
   d <- na.omit(cbind(x, time))
   if (nrow(d) > 2) {
      lm(x ~ time, data=data.frame(d))$coefficients[2] 
   } else {
      NA
   }
} 
b[1:10] <- NA
x2 <- calc(b, fun2)

to get r^2

 fun <- function(x) { summary(lm(x ~ time))$r.squared }

Addendum

I noticed that this function is a bit slow as, for each raster cell, it fits a model and returns a lot of information that is not used. For each model some computations are repeated (as the independent variable is fixed). If you want simple output like the slope or intercept you can easily shortcut things by directly computing these via linear algebra, and pre-computing some of the intermediate (constant) results.

For the case without NAs only:

library(raster)
b <- brick(system.file("external/rlogo.grd", package="raster"))
time <- 1:nlayers(b)

LMfun <- function(x) { lm(x ~ time)$coefficients[2] }
system.time( xlm <- calc(b, LMfun) )

# user  system elapsed 
# 7.95    0.00    7.96 

# add 1 for a model with an intercept
X <- cbind(1, time)
# pre-computing constant part of least squares
invXtX <- solve(t(X) %*% X) %*% t(X)
# much reduced regression model. [2] is to get the slope 
LAfun <- function(y) (invXtX %*% y)[2] 
system.time( xla <- calc(b, LAfun) )

# user  system elapsed 
# 0.06    0.00    0.06

So this approach is about 130 times faster!

Related Solutions

[GIS] How to open/display and extract NetCDF files info

What I do in R is use the ncdf package to read the data into R, which puts the data into a multidimensional array. Then I use the plyr package combined with basic R tools to perform any processing steps (temporal average, extract timeseries). Finally, I visualize my results using the ggplot2 package. For more information on spatial data in R, please visit the R Spatial Taskview. A particularly interesting package for satellite raster data is the raster package.

ArcGIS Desktop – How to Make Linear Regression Across Multiple Raster Layers

For a problem this small the slopes are easily computed with a simple raster calculation. Given that the years are consecutive, let's name the rasters [y.1], [y.2], [y.3], [y.4], and [y.5] in temporal order. The slope grid is

(2/10) * ([y.5] - [y.1]) + (1/10) * ([y.4] - [y.2])

For other than five rasters--but still assuming they represent consecutive times--there is a similar formula. Each raster [y.i], for i = 1, 2, ..., through n, gets multiplied by a coefficient and all these results are added up. The coefficients are obtained by writing down the numbers

12, 24, 36, ..., 12n

and subtracting 6(n+1) from them. For instance, with n=8 we would subtract 6(8+1) = 54 from each, giving the eight numbers

-42, -30, -18, -6, 6, 18, 30, 42

These would multiply the rasters in temporal order. It's convenient to pair them by common coefficient sizes so you could write this out as

42 * ([y.8] - [y.1]) + 30 * ([y.7] - [y.2]) + 18 * ([y.6] - [y.3]) + 6 * ([y.5] - [y.4])

That reduces the amount of writing and the number of grid multiplications that are done. Finally, divide the result by n^3 - n. In the case n = 8, n^3 - n = 512 - 8 = 504. The net effect (if you want to compare this to other formulas) would be to multiply the input rasters by the coefficients

-1/12, -5/84, -1/28, -1/84, 1/84, 1/28, 5/84, 1/12

and add up the results.

In more general situations, where there may be varying intervals between the rasters, there is still a similar formula: the slope grid is always a linear combination of the rasters, but the coefficients will be less regular. The coefficients can be found from the general formula (X'X)^(-1)X' where X is the n by 2 "design matrix" having a column of n 1's and a second column set to the times of the grids.

Usually the wrong way to do this is to loop over all the cells, pick out the cell values in the n rasters, and send them (and the times) to a line-fitting routine. That's much more work than necessary, because in effect each call to that routine is working out the same coefficients millions of times over (once for each cell). If, however, the rasters have a large number of missing values occurring in many different patterns, this longer way would make sense, for then you could obtain slopes even where one or more of the rasters is missing a value but the remaining rasters do have values.

Best Answer

Related Solutions

[GIS] How to open/display and extract NetCDF files info

ArcGIS Desktop – How to Make Linear Regression Across Multiple Raster Layers

Related Question