Solved – Using information on both sides of a ‘gap’ in time series data for imputation

data-imputationforecastingrtime series

As with my previous question, I'm looking at ways to impute missing data in a hierarchical time series data.

With al my other procedures, including the experimentation of imputation packages (Amelia, HoltWinters from Forecast and MICE imputation) I've only been able to use the time series data prior to the missing gap.

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001 220 194 238 190 217 244 242 225 242 259 267 244
2002 212 246 250 236 261 286 265 269 226 267 234 246
2003 202 199 297 272 236 266 235 226 260 183 226 265
2004 211 215 219 213 240 236 273 266 262 244 241 235
2005 212 198 233 251 259 282 305 267 241 264 222 269
2006 182 220 250 287 279 281 286 332 300 272 221 233
2007  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2008 193 215 235 242 246 315 326 280 279 239 236 258
2009 246 189 257 241 268 223 260 288 234 260 216 195

I'm trying to do simple imputation procedure that uses forecasting and backcasting estimates from the time series model. Forecasting using prior data to predict the future and backcasting using the later data to “predict” the past.

I would then like to combine the forecast and backcast value to use as imputation. After which I will look at the fit etc.

How do I go about this in coding?

For example, I'm able to determine what SARIMA model exist for the first period 2001-end2006. But not the full period (because my basic functions I know from R does not support the NA values.)

This is only for the period 2001-end2006:

ARIMA(2,0,2)(1,0,1)[12] with non-zero mean 

Call: auto.arima(x = ts.datt) 

Coefficients:
         ar1      ar2      ma1     ma2    sar1     sma1  intercept
      1.3610  -0.8258  -1.2407  0.9191  0.8982  -0.7560   244.8374
s.e.  0.0884   0.0960   0.0878  0.1127  0.2190   0.3335     6.1894

sigma^2 estimated as 605.9:  log likelihood = -335.01
AIC = 686.02   AICc = 688.3   BIC = 704.23

Should I just model the first period, predict by forecast; model then the last period separately and then backcast? How will I do this backcasting (ie. 'predicting' the past)?

EDIT:
What I'm asking:
1) How do I use the data from years 2008 & 2009 to BACKCAST? I already know how to use 2001-2006 to forecast.

2) How do I determine the SARIMA model for the whole period? (2001-2009) ie.

Best Answer

Try using na.StructTS in the zoo package. It has methods for zoo and ts series. e.g. using the built in USAccDeaths insert some NAs and then interpolate them:

library(zoo)
window(USAccDeaths, 1975, c(1975, 12)) <- NA
na.StructTS(USAccDeaths)

See ?na.StructTS for more.