Solved – backfill missing data in a time series

missing datatime series

I have a time series {y_t, t=1,2,3,…,N} and y_t is missing in the period[t=s, t=s+1, … t=s+M]. I want to backfill y_t using regression based on other time series {x_t} in a rolling-manner. Specifically, I use the sample before t=s to run a regression of y_t on x_t, and predict y_s based on the regression estimation; then I add the data t=s into the sample and run the regression again, to predict y_s+1; and so on, until I can predict y_s+M. After I do that, I found that the predicted value of y_s+M is quite different from y_s+M+1, which essentially create a big jump to the time series y_t from t=s+M to t=S+M+1. So is there any way that I can get rid of this issue? Or is there any better method to backfill y_t? Thanks a lot.

Best Answer

Let's define $\hat{y}_0^f, \ldots, \hat{y}_M^f$ as the forward predictor you built. That is, $\hat{y}^f_0$ is your estimate for $y_{s + 0}$ based on values before $s + 0$, $\hat{y}_1^f$ is your estimate for $y_{s + 1}$ based on values before $s + 1$, and so on. It stands to reason that the MSE of $\hat{y}_i^f$ is increasing in $i$ (that is, the error grows as you try to predict farther into the future).

You could also go the other way, and build a backward predictor. That is, $\hat{y}^b_0$ is your estimate for $y_{s + M - 0}$ based on values after $s + M - 0$, $\hat{y}_1^b$ is your estimate for $y_{s + M - 1}$ based on values after $s + M - 1$, and so on.

Since the forward predictor will probably be better for values at time close to $s$, the backward predictor will probably be better for values at time close to $s + M$, and you want continuity, then one way to go would be to use the weighted average

$$ \hat{y}_i^c = \frac{w_i^f \hat{y}_i^f + w_{M - i}^b \hat{y}_{M - i}^b}{w_i^f + w_{M - i }^b}. $$

where $w_i^f$, $w_{M - i }^b$ are the estimations of the MSE of the forward and backward predictor MSEs. It's possible to assess them from the rest of the data. Alternatively, you could guess that

$$ w_i^f \sim i^2 \\ w_{M - i}^b \sim \left( M - i \right)^2 , $$

because of the variance of a random walk behaves this way.

Related Question