Solved – Gaps in time series and time series validity

datasetforecastingtime series

After doing some reading on CrossValidated, I understood that we can use "imputation" techniques to fill in the gaps (if they are random). But I am not clear on following questions:

  1. How many consecutive gaps may make data set invalid for forecasting?

  2. How many total gaps in data set makes it as invalid.

For example I have hourly data for week, which means 188 total points in my data set

  1. Case 1: assume if we are missing 3 consecutive days of data is missing, can we still consider that data set as valid data set?
  2. Case 2: assume overall 80 data points are missing out of 188 points, can we still consider data set as valid?

I am using HoltWinters implementation in java for forecasting.

Any help would be appreciated.

Best Answer

I am not sure what you mean by "a valid data set". Are you sure what you mean by it? There are reasons why, in a single or in multiple time series consecutive missingness would be irrelevant to the validity of an analysis, and reasons why it would be lethal to valid inference.

However, Honaker and King are at the head of practical multiple imputation within a time-series context:

Honaker, J. and King, G. (2010). What to do about missing values in time-series cross-section data. American Journal of Political Science, 54(2):561–581. (See also, the related R package Amelia II on CRAN)

It is not clear how familiar you are with multiple imputation, but it has two aims (1) to support inference that is unbiased by MAR and MCAR (i.e. to impute a set of reasonable values), and (2) in doing so to incorporate the additional uncertainty in one's analysis that is due to the presence of missing data (i.e. to incorporate the extra variation resulting from imputed values not all agreeing with one another).

Related Question