Solved – How to model binary dependent data with temporal autocorrelation

autocorrelationbinary datartime series

I am trying to model annual tree nut production using climate predictors.

The nut data (dependent) is a binary timeseries (0,1 – representing unsuccessful and successful nut production), with one observation per year, and with 90 years of data and two missing years (88 onservations).

The independent variables are monthly climate variables, including months in previous years (for example, Temp.July.t, Temp.July.t-1)

I'm using R, and have an basic-intermediate knowledge of statistics.

My problem is that the dependent data has strong temporal autocorrelation (nut production cannot be successful two years running). I'm looking for a pointer towards a technique that will allow me to deal with the autocorrelation in the binary data and create a statistical model that allows me to investigate the relationship between nut production and climate.

Thank you.

Best Answer

If the only autocorrelation present is that a successful year must be followed by an unsuccessful year, a simple and effective approach would be to calculate a binary logistic regression on a dataset that excludes the dependent variable for all those years in which the preceding year was successful. Because those years were already determined as unsuccessful by success in the preceding year, the fact that they were unsuccessful can yield no information about the relevance to success or otherwise of the independent climate variables. With 90 years of data the exclusion of those years should still leave a reasonably large dataset.

The resulting model will be partly deterministic (the year after a successful year is an unsuccessful year) and partly stochastic (the year after an unsuccessful year has a certain probability of success, dependent on the climate variables, as estimated by the regression).

(With apologies if this is stating the obvious) excluding the dependent variable for year t because year t-1 was successful does not mean dropping all the data for year t. The independent variables for the months in year t are still potentially relevant as lagged variables influencing success or otherwise in years after t.

Related Question