[GIS] Calculating Slope Between Pixels Through Time using ArcPy

arcpyregressionslopespatial-analysttime

I have a 9 year weekly time series (~500 raster grids of equal cell size and extent). I'm interested in obtaining the regression line slope between pixels (Imagine stacking all 500 grids on top of one another and running a linear regression between each individual pixel). Essentially, this would measure the net change between pixels through my time series.

My main question is about the coefficients defined in the first answer to Making linear regression across multiple raster layers using ArcGIS Desktop?

Is the coefficient defined as 12 due to the temporal scale of the original question being at the monthly level and, if so, should my code's coefficient be set to 7 due to the weekly temporal scale between grids?

I have been referencing these Q&As that are similar:

I wrote this python code after reading the links referenced above:

from __future__ import division
import arcpy
import os
from arcpy import env
from arcpy.sa import*


# define workspace
arcpy.env.workspace=r"C:\Users\mbs7038\Documents\Remove_Gauge_Outliers\test\forSlope.gdb"

# enable overwriting
arcpy.env.overwriteOutput=True

# check spatial analyst extension
arcpy.CheckOutExtension('Spatial')

# define output paths
slopePath=r"C:\Users\mbs7038\Documents\Remove_Gauge_Outliers\SpatialTrends\SlopeTEST.gdb"

# list all rasters in the workspace
rasters=arcpy.ListRasters('*')
# sort rasters numerically
rasters.sort()

# get the number of rasters
n=len(rasters)
print n

# setup index
i=1

# define division
seed=(n*n*n)-n
print 'the global seed is {0}'.format(seed)

for raster in rasters:
    print i
    coef=(12*(i)-((6*n)+(6*1)))/seed

    print 'Raster {0} is {1}:'.format(i,raster)
    print 'the coef for raster {0} is {1}'.format(i,coef)

    # Multiple raster by coefficient
    if i==1:
        outSlope=(Raster(raster)*coef)
        i+=1  # same as saying i=i+1
    else:
        print 'adding {0} to outSlope'.format(raster)
        outSlope=outSlope+(Raster(raster)*coef)
        i+=1
    if i==6:
        break

# Save final slope grid
print 'saving final slope grid'
outSlope.save(slopePath + "/" + "floodChange_py")

print 'script is complete'

I created this code as a test which appeared to work on a 5 week subset of my data.
If what I have referenced above is correct then this code will work on a time series (of equal cell size and extent) of any length. Although there were no bugs in the code, I have a feeling I may have made a mistake someplace.

On a side note I'm using PyScripter, hence the "from future import division" in the first line. This disables the automatic floor function which rounds floating point numbers to the nearest integer away from zero.

Best Answer

Because we are free to choose how the times are designated, let's center them at zero with unit spacing, so that for n=500 times their values are -(n-1)/2, -(n-1)/2+1, ..., +(n-1)/2-1, +(n-1)/2. The ordinary least squares formula for the slope is obtained by multiplying each of these time values by their corresponding observations, taking the average of all those values, and dividing that average by the variance of the times. Because the average time is at zero, you can compute that the variance just the average squared time,

([-(n-1)/2]^2 + [-(n-1)/2+1]^2 +  ... + [(n-1)/2]^2) / n = (n^2-1)/12.

That is where the factors of 12 come in. You can also now see why the factor of 1/(n^3-n) in the first-referenced answer appears. Its reciprocal equals n(n^2-1): the factor of n comes from the averaging of the n products and the factor of n^2-1 comes from the variance of the times.

The units in which the slope is thereby expressed are the units of the observations divided by the time interval. For weekly time series, then, the slope gives the mean change in values per week.


A better way to assess the net change of values over the course of the times is to subtract the median of the values observed during the first few weeks from the median of values observed during the last few weeks. This is conceptually simple, it is easy to compute (use local statistics), it is fast (you only need to access a small subset of the datasets), it is robust (all kinds of weird behavior during the time period will not affect it), it is flexible (you can choose the durations of the two end periods), it more accurately captures the actual start-to-end-differences, and it is easily applied to intermediate times to assess shorter-term changes.