Solved – How many data points to acurately approximate the average of recent values in a time series

meanstatistical significancetime series

I have some time series data, where each data point in the time series has a timestamp and a value. For example:

Time 1    value 5 
Time 2    Value 7
Time 3    Value 10

Now at a given time X, I want to obtain an approximation on the average of recent values in the time series. For example, consider that the time series is temperature readings for a machine. I want to calculate the average temperature readings in recent days. To do this, I must choose the number of data points to be used to calculate the average.

On the one hand, if I choose to use a smaller number of values such as the two last values of the time series to calculate the average, the average will not be very reliable because these data points may be outliers.

On the other hand, if I choose to use all previous values of the time series, this may not reflect recent trends in the time series, but the average would better tolerate outliers.

Thus, my question is how can a determine a minimum number of data points to calculate an average of recent values in a time series that is considered "reliable". The concept of "reliable" should be expressed in terms of a confidence interval, if possible.

Of course, the concept of "recent values" is fuzzy, but there are certainly some lower bound on the number of data points that should be use to calculate an average reliably.

Just to make my question clearer, I will tell you the solution that I have thought about, which is in my opinion not good. I have though about using the Hoeffding bound ( https://en.wikipedia.org/wiki/Hoeffding%27s_inequality), as it "provides an upper bound on the probability that the sum of independent random variables deviates from its expected value". But I think that it is inappropriate for my problem with a time series as events may not be independent or the time series may not be stationnary. Thus, what could be used as a better technique?

Edit: I have edited the question since some people have decided put the question on hold.

Best Answer

I agree with Carl that fitting a model to the data would be an ideal solution, if it makes sense in your context. But I also want to suggest that the following way of thinking about your problem might be helpful.

Suppose that I have a time-varying process that looks like a random walk (so that the location is autocorrelated through time). Suppose I measure this position with some measurement error (which is an IID random variable) at equally spaced time points. I understand your question to mean:

How many time points should I average in order to get the best possible information about the process' current location?

If the random walk careens around wildly and the measurement error is small, then the answer might well be use only the most recent point. Using previous points in the average would introduce lots of extra variation due to the random walk, and a single point is already a reasonably good approximation of the process' true location. If the measurement error is large and the random walk moves slowly, then the answer will be use a lot of points. The random walk is relatively stationary, and you need to average over the noise in the measurement. Interestingly, if your measurement noise is extremely fat-tailed (i.e. Cauchy distributed), then the answer will always be use just the most recent point (because the average of multiple points does not provide a better approximation to the central tendency of that distribution than any single point does!).

It should be possible to work out the ideal number of points to use in special cases where the distribution followed by the random walk and the distribution of the measurement error are both known. However, this is precisely the case where a model, as suggested by Carl, would be useful.

Edit Carl's comment also made me realize that it's very likely that a weighted average (that weights more recent points more heavily) could outperform an average that introduces some hard-threshold cutoff for inclusion.

Related Question