Statistics – Most Scientific Way to Assign Weights to Historical Data

averagedata analysisstatistics

This is a common question I usually face while processing historical data. I have year on year data of an event for the past N years.I would like to assign weights to the data of these N years so that the data corresponding to the most recent year as the highest weight and the data corresponding to the oldest year has the least. This is to give more importance to recent trend as compared to very old trends.

Questions

  1. Is there any scientific way to assign weights to the years i.e. what should be the weight for the most recent year, what should be the weight for the previous year an so on?

  2. Is there any scientific justification in assigning equal weights to the preceding historical data after a certain point?

  3. Is there any scientific justification to ignore the preceding historical data after a certain point i.e. weight = 0?

I think the answers will depend on the exact nature of the data under study and we might not have a general answer that will work with all data. Even then can we at least have a rule of thumb which is independent of the actual data. Any reference or pointers to literature would also be helpful.

Best Answer

There are several ways to do that, depending on your goals. You can pick a few examples from any website that reports economic data, for example stock quotes history. Here are most popular methods:

Method 1: Perhaps the most well-behaved function would be exponential, as Rahul suggested. That means that you pick some number $a<1$, and use geometric progression $1,a,a^2,a^3,…$ to assign weights to each year. The total sum would be $1\over{1-a}$, so you if you pick $a$ you normalize weight to $\frac{1}{1-a}, \frac{a}{1-a}, \frac{a^2}{1-a}, …$ if you want all weights to add up to $1$.

Method 2: If you want to give equal weights to the most recent $n$ years and $0$ to earlier ones you can do what economists call "running average", weights $\frac{1}{n}$ to most recent $n$ years. This function is popular as well, but not as well-behaved as exponential. I guess this answers the 2nd question as well.

Method 3: Some engineers prefer sigmoid. This is an analytic function that pretends to assign equal weights to most recent data, then quickly recedes to $0$, but without the discontinuous step of the running average.

Other methods: In some circumstances physics or probability dictate other distributions. For example, if signal propagation is Gaussian (which happens often enough in physics) then the only relevant choice is erfc.

About justification: yes, there are many reasons to put more emphasis on more recent data and assign equal weight $0$ to older one. For example, older economic data may be irrelevant to projections. Or you are modeling some other effects that naturally decay in time.

Related Question