Solved – How to model binary dependent data with temporal autocorrelation

autocorrelationbinary datartime series

I am trying to model annual tree nut production using climate predictors.

The nut data (dependent) is a binary timeseries (0,1 – representing unsuccessful and successful nut production), with one observation per year, and with 90 years of data and two missing years (88 onservations).

The independent variables are monthly climate variables, including months in previous years (for example, Temp.July.t, Temp.July.t-1)

I'm using R, and have an basic-intermediate knowledge of statistics.

My problem is that the dependent data has strong temporal autocorrelation (nut production cannot be successful two years running). I'm looking for a pointer towards a technique that will allow me to deal with the autocorrelation in the binary data and create a statistical model that allows me to investigate the relationship between nut production and climate.

Thank you.

Best Answer

If the only autocorrelation present is that a successful year must be followed by an unsuccessful year, a simple and effective approach would be to calculate a binary logistic regression on a dataset that excludes the dependent variable for all those years in which the preceding year was successful. Because those years were already determined as unsuccessful by success in the preceding year, the fact that they were unsuccessful can yield no information about the relevance to success or otherwise of the independent climate variables. With 90 years of data the exclusion of those years should still leave a reasonably large dataset.

The resulting model will be partly deterministic (the year after a successful year is an unsuccessful year) and partly stochastic (the year after an unsuccessful year has a certain probability of success, dependent on the climate variables, as estimated by the regression).

(With apologies if this is stating the obvious) excluding the dependent variable for year t because year t-1 was successful does not mean dropping all the data for year t. The independent variables for the months in year t are still potentially relevant as lagged variables influencing success or otherwise in years after t.

Related Solutions

Solved – GLM with Temporal Data

I'm still learning a lot in this area, but since you don't have an answer yet, my thoughts are...

The correlation structure you specify in the various functions that allow it (gls, lme, etc) are for within-group correlation, so I don't believe AR1 is correct since the multiple measurements are within the same timeframe.

Perhaps you want (I created dat2, which centers your variables):

gls (wat ~ rain + temp, dat2, correlation=corCompSymm (form = ~1 | month))

which gives answers, in your example, similar to GEE:

library (geepack)
geeglm (wat ~ rain + temp, data = dat2, id = month, corstr = "exchangeable")

Unfortunately, I've read several papers on GEE v GLMM and still haven't figured out whether GEE would be applicable in such a case. There are several threads on this, one of which is:

What is the difference between generalized estimating equations and GLMM?

Hope that helps.

Solved – How to account for temporal autocorrelation in mixed effects logistic regression

If you would include the time variable in the specification of the random-effects structure of the model you would account for temporal auto-correlations. You could further evaluate using likelihood ratio tests whether a more complex temporal structure is required by including nonlinear time effects in the random effects via polynomials or splines.

Best Answer

Related Solutions

Solved – GLM with Temporal Data

Solved – How to account for temporal autocorrelation in mixed effects logistic regression

Related Question