Solved – Plotting Climate Theil-Sen Trend, Strange Intercept Estimate

climaterregressiontime seriestrend

So I am simply trying to plot observed data and the Theil-Sen trend estimate on one plot, but the estimates have been strange but consistent across methods/packages. As you can see in the figure, I have a time series of averaged yearly minimum temperatures across over 30 stations over 50 years ranging between 5 and 10 degrees C:

When I use different packages in R (trend, mlblm, and this one:https://rdrr.io/github/jentjr/gwstats/man/get_theilsen.html)

I get statistically significant slopes of around 0.0148, but the intercept is always [-28,-21]. I don't understand why I am getting this result, not only from a plotting standpoint (as it would not line up with the data at all) but also given what I know about the median method used to calculate this intercept. Can someone help me understand this result and help me figure out how to plot the trend and observed data together?

Best,

Jacob

edit:

I used Tom's advice and got the following:

The intercept seems high, and while I know it is based on the median of pairs rather than the average, that trend line just does not sit right with me.

Best Answer

The advice by @Tom helps: set 1950 as Year 0, and the results are much more reasonable. (Shown below, blue line). I don't know why this would be, but I also don't know how mblm calculates the intercept. As a note, this problem does not occur with the quantile regression shown below with red line.

Data are approximate.

if(!require(mblm)){install.packages("mblm")}
if(!require(ggplot2)){install.packages("ggplot2")}

Data = read.table(header=T, text="
Year               MinTemp
1950.0382043935053 5.519238641015146
1950.9878564606358 5.918338108882523
1952.116250511666  6.569279574293901
1953.0945558739256 6.527834629553828
1954.2161277118298 7.283667621776505
1955.0211488606906 6.906467458043391
1956.202756174103  6.739255014326648
1956.7649065356802 7.096193205075728
1957.8455450948288 8.48137535816619
1958.6410151453133 8.251023331968891
1959.8376313276028 7.853049529267295
1961.038340837768  7.392140810478921
1961.8024287078729 7.644289807613591
1962.7616318733799 7.8965411379451504
1963.7208350388867 8.14879246827671
1964.748260335653  7.3521285304952935
1965.9394187474418 7.038067949242736
1966.894528585073  7.353254195661073
1968.0652203574841 7.353868194842407
1968.7815527357075 8.34025787965616
1970.0109155410016 7.4388047482603366
1970.9714831491337 7.670077773229637
1971.9102196752629 8.23700368399509
1973.0863692181745 8.153704461727385
1974.0769545640608 7.923454768726976
1974.8342202210397 8.280495292672944
1976.021285304953  8.029369627507165
1976.9791240278348 8.302599263200984
1977.9437849638423 8.470937372083505
1978.9384636376042 8.177752762996317
1979.701187065084  8.450880065493248
1981.2593805430483 8.493655341792879
1982.0412061672807 8.473086369218175
1983.0126893164143 8.536532951289399
1983.8095238095239 8.28520261972984
1984.8042024832857 7.992018010642654
1985.7715923045437 8.118399508800655
1986.726702142175  8.433585755218994
1988.1129758493655 8.119627507163324
1988.9411925228544 7.385796152271798
1989.9249556556147 7.260437986082687
1991.032883067267  8.226054031927958
1991.9948151180245 8.436348751534998
1992.9826715786603 8.24805566925911
1993.8272615636513 7.262484650020468
1994.785100286533  7.5357142857142865
1995.9953608950746 6.927957429390096
1996.7075999454223 7.9772820302906275
1997.6627097830537 8.292468276708965
1998.8402237685907 8.188190749079002
1999.817164688225  8.167724109701188
2000.8336744439898 7.538886614817848
2002.0193750852777 7.308739255014327
2002.756174102879  7.980454359394189
2003.7440305635148 7.792161277118298
2004.8833401555466 8.275276299631601
2005.6815390912814 8.002967662709784
2006.6707599945423 7.7936962750716345
2008.0556692591076 7.500716332378224
2009.4815118024287 9.578387228817029
2010.7313412471005 8.362259516987312
2011.7055532814845 8.383749488334017
2012.660663119116  8.698935734752354
")

Data$Year = round(Data$Year)

Data$Year = Data$Year - 1950


library(mblm)

model= mblm(MinTemp ~ Year, data=Data)

summary(model)

   ### Coefficients:
   ###             Estimate      MAD V value Pr(>|V|)    
   ### (Intercept) 7.820528 0.499264    2016 5.29e-12 ***
   ### Year        0.009301 0.012388    1758 2.88e-07 ***

Sum = summary(model)$coefficients

library(ggplot2)

ggplot(Data, aes(x=Year, y=MinTemp)) + 
  geom_point() +
  geom_abline(intercept = Sum[1], slope = Sum[2], color="blue", size=1.2) +
  labs(x = "Years after 1950")

### Optional quantile regression follows

if(!require(quantreg)){install.packages("quantreg")}

library(quantreg)

model.q = rq(MinTemp ~ Year, data = Data, tau = 0.5)

summary(model.q)

   ### Coefficients:
   ###             coefficients lower bd upper bd
   ### (Intercept) 7.46789      6.97021  7.78631 
   ### Year        0.01470      0.00656  0.02508 

library(ggplot2)

model.null = rq(MinTemp ~ 1, data = Data, tau = 0.5)

anova(model.q, model.null)

   ### Quantile Regression Analysis of Deviance Table
   ### 
   ### Model 1: MinTemp ~ Year
   ### Model 2: MinTemp ~ 1
   ###   Df Resid Df F value  Pr(>F)  
   ### 1  1       61  5.7424 0.01964 *

Sumq = summary(model.q)$coefficients

ggplot(Data, aes(x=Year, y=MinTemp)) + 
  geom_point() +
  geom_abline(intercept = Sumq[1], slope = Sumq[2], color="red", size=1.2) +
  labs(x = "Years after 1950")

Related Solutions

Solved – Intercept calculation in Theil-Sen estimator

The Theil-Sen estimator is essentially an estimator for the slope alone; the line has been constructed in a host of different ways - there are a large variety of ways to calculate the intercept.

You said:

My understanding of the intercept calculation is that I first calculate the median slope, and then construct a line through every data point with this slope, find the intercept of every line, and then take the median intercept.

A common one (probably the most common) is to compute median($y-bx$). This is what Sen looked at, for example; if I understand your intercept definition correctly this is the same as the intercept you mention.

There are a couple of approaches that compute the intercept of the line through each pair of points and attempts to get some kind of weighted-median but based off that (putting more weight on the points further apart in x-space).

Another is to try to get an estimator with higher efficiency at the normal (akin to that of the slope estimator in typical situations) and similar breakdown point to the slope estimate (there's probably little point in having better breakdown at the expense of efficiency), such as using the Hodges-Lehmann estimator (median of pairwise averages) on $y-bx$. This has a kind of symmetry in the way the slopes and intercepts are defined ... and generally gives something very close to the LS line when the normal assumptions nearly hold, whereas the Sen-intercept can be - relatively speaking - quite different.

Some people just compute the mean residual.

There are still other suggestions that have been looked at. There's really no 'one' intercept to go with the slope estimate.

Dietz lists several possibilities, possibly even including all the ones I mentioned, but that's by no means exhaustive.

Solved – Showing spatial and temporal correlation on maps

I think there are a few options for showing this type of data:

The first option would be to conduct an "Empirical Orthogonal Functions Analysis" (EOF) (also referred to as "Principal Component Analysis" (PCA) in non-climate circles). For your case, this should be conducted on a correlation matrix of your data locations. For example, your data matrix dat would be your spatial locations in the column dimension, and the measured parameter in the rows; So, your data matrix will consist of time series for each location. The prcomp() function will allow you to obtain the principal components, or dominant modes of correlation, relating to this field:

res <- prcomp(dat, retx = TRUE, center = TRUE, scale = TRUE) # center and scale should be "TRUE" for an analysis of dominant correlation modes)
#res$x and res$rotation will contain the PC modes in the temporal and spatial dimension, respectively.

The second option would be to create maps that show correlation relative to an individual location of interest:

C <- cor(dat)
#C[,n] would be the correlation values between the nth location (e.g. dat[,n]) and all other locations.

EDIT: additional example

While the following example doesn't use gappy data, you could apply the same analysis to a data field following interpolation with DINEOF (http://menugget.blogspot.de/2012/10/dineof-data-interpolating-empirical.html). The example below uses a subset of monthly anomaly sea level pressure data from the following data set (http://www.esrl.noaa.gov/psd/gcos_wgsp/Gridded/data.hadslp2.html):

library(sinkr) # https://github.com/marchtaylor/sinkr

# load data
data(slp)

grd <- slp$grid
time <- slp$date
field <- slp$field

# make anomaly dataset
slp.anom <- fieldAnomaly(field, time)

# EOF/PCA of SLP anom
P <- prcomp(slp.anom, center = TRUE, scale. = TRUE)

expl.var <- P$sdev^2 / sum(P$sdev^2) # explained variance
cum.expl.var <- cumsum(expl.var) # cumulative explained variance
plot(cum.expl.var)

Map the leading EOF mode

# make interpolation
require(akima)
require(maps)

eof.num <- 1
F1 <- interp(x=grd$lon, y=grd$lat, z=P$rotation[,eof.num]) # interpolated spatial EOF mode


png(paste0("EOF_mode", eof.num, ".png"), width=7, height=6, units="in", res=400)
op <- par(ps=10) #settings before layout
layout(matrix(c(1,2), nrow=2, ncol=1, byrow=TRUE), heights=c(4,2), widths=7)
#layout.show(2) # run to see layout; comment out to prevent plotting during .pdf
par(cex=1) # layout has the tendency change par()$cex, so this step is important for control

par(mar=c(4,4,1,1)) # I usually set my margins before each plot
pal <- jetPal
image(F1, col=pal(100))
map("world", add=TRUE, lwd=2)
contour(F1, add=TRUE, col="white")
box()

par(mar=c(4,4,1,1)) # I usually set my margins before each plot
plot(time, P$x[,eof.num], t="l", lwd=1, ylab="", xlab="")
plotRegionCol()
abline(h=0, lwd=2, col=8)
abline(h=seq(par()$yaxp[1], par()$yaxp[2], len=par()$yaxp[3]+1), col="white", lty=3)
abline(v=seq.Date(as.Date("1800-01-01"), as.Date("2100-01-01"), by="10 years"), col="white", lty=3)
box()
lines(time, P$x[,eof.num])
mtext(paste0("EOF ", eof.num, " [expl.var = ", round(expl.var[eof.num]*100), "%]"), side=3, line=1) 

par(op)
dev.off() # closes device

Create correlation map

loc <- c(-90, 0)
target <- which(grd$lon==loc[1] & grd$lat==loc[2])
COR <- cor(slp.anom)
F1 <- interp(x=grd$lon, y=grd$lat, z=COR[,target]) # interpolated spatial EOF mode


png(paste0("Correlation_map", "_lon", loc[1], "_lat", loc[2], ".png"), width=7, height=5, units="in", res=400)

op <- par(ps=10) #settings before layout
layout(matrix(c(1,2), nrow=2, ncol=1, byrow=TRUE), heights=c(4,1), widths=7)
#layout.show(2) # run to see layout; comment out to prevent plotting during .pdf
par(cex=1) # layout has the tendency change par()$cex, so this step is important for control

par(mar=c(4,4,1,1)) # I usually set my margins before each plot
pal <- colorRampPalette(c("blue", "cyan", "yellow", "red", "yellow", "cyan", "blue"))
ncolors <- 100
breaks <- seq(-1,1,,ncolors+1)
image(F1, col=pal(ncolors), breaks=breaks)
map("world", add=TRUE, lwd=2)
contour(F1, add=TRUE, col="white")
box()

par(mar=c(4,4,0,1)) # I usually set my margins before each plot
imageScale(F1, col=pal(ncolors), breaks=breaks, axis.pos = 1)
mtext("Correlation [R]", side=1, line=2.5)
box()

par(op)

dev.off() # closes device