Solved – Definition of the function for exponentially decaying weighted average

exponential-smoothingmean

I feel really thick asking this question, but I'm afraid I don't really understand the Wikipedia article explaining how to do a weighted average with expontentially decreasing weights.

I really have two questions, I guess:

  • I don't see anywhere in the Wiki article an explicit statement of the actual function to implement, something along the lines of exp_dec_avg(x1, x2, x3, ..., xn) = ... I think the reader is supposed to infer the definition of this function from the details provided but there are gaps my unstatsy mind can't fill. Could someone give explicitly state the definition for me.

  • More mundane, I was assuming that exp_dec_avg(3, 3, 3) = 3 but my probably horribly wrong implementation does not return this. Am I right?

Best Answer

A weighted average of any sequence $x_1, x_2, \ldots, x_n$ with respect to a parallel sequence of weights $w_1, w_2, \ldots, w_n$ is the linear combination

$$(w_1 x_1 + w_2 x_2 + \cdots + w_n x_n) / (w_1 + w_2 + \cdots + w_n).\tag{1}$$

An exponentially weighted average (EWS), by definition, uses a geometric sequence of weights

$$w_i = \rho^{n-i} w_0$$

for some number $\rho$. Since the common factor of $w_0 \ne 0$ will cancel in computing the fraction $(1)$, we may take $w_0=1$ if we wish. The EWA depends on the weights only through the number $\rho$. Moreover, the denominator of $(1)$ simplifies to $1 + \rho+\rho^2 + \cdots + \rho^{n-1} = (1-\rho^n)/(1-\rho)$, enabling us to write

$$\operatorname{EWA}_\rho(x_1, \ldots, x_n) = \frac{1-\rho}{1-\rho^n}(\rho^{n-1}x_1 + \rho^{n-2} x_2 + \cdots + x_n).$$

What makes these particularly nice is that as the sequence $(x_i)$ grows, its EWA is very simple to update, because

$$\eqalign{ \operatorname{EWA}_\rho(x_1, \ldots, x_n, x_{n+1}) &= \frac{1-\rho}{1-\rho^{n+1}}(\rho^n x_1 + \rho^{n-1} x_2 + \cdots + \rho x_n + x_{n+1}) \\ &= \rho\frac{1-\rho^{n}}{1-\rho^{n+1}}\operatorname{EWA}_\rho(x_1, \ldots, x_n) + \frac{1-\rho}{1-\rho^{n+1}}x_{n+1}.\tag{2}}$$

Although that looks messy, it's really very simple: the updated EWA is a weighted average of the previous EWA and the new value $x_{n+1}$. We don't need to hold on to all the $n$ preceding values: we only need the most recent EWA. Even better, usually $|\rho| \lt 1$ (to downweight the "older" values compared to the "newer" ones later in the sequence), which means once $n$ is sufficiently large, the values of $\rho^n$ and $\rho^{n+1}$ are negligible compared to $1$, whence

$$\frac{1-\rho^n}{1-\rho^{n+1}} \approx 1;\quad \frac{1-\rho}{1-\rho^{n+1}} \approx 1-\rho $$

to high accuracy. With this approximation in mind, the update $(2)$ becomes

$$\operatorname{EWA}_\rho(x_1, \ldots, x_n, x_{n+1}) = \rho\operatorname{EWA}_\rho(x_1, \ldots, x_n) + (1-\rho)x_{n+1}.\tag{2a}$$

This rule $(2a)$ is sometimes used to define the EWA, recursively.

Now, provided $\rho \gt 0$ (which is very nearly always the case), it's straightforward to show that the weighted average lies between the extremes of the data values, so in particular

$$\max(x_1, \ldots, x_n) \ge \operatorname{EWA}_\rho(x_1, \ldots, x_n) \ge \min(x_1, \ldots, x_n).$$

When $x_1=x_2=\cdots=x_n=c,$ say, then obviously the EWA must be $c$ itself (which is both the max and min of the $x_i$).

Here are some illustrations of how the EWA works.

Figures

At left is the $\operatorname{EWA}_\rho$ of $(1,2,\ldots, 10)$ as $\rho$ ranges from $0$ to $1$. As $\rho\to 1$, the EWA approaches the arithmetic mean because all the weights $\rho^{n-i}$ approach equal values of $1$. When $\rho \approx 0$, all but the last value ($x_{10}=10$) are heavily downweighted, producing an EWA close to the last value.

At middle and right are sequences of dots showing $x_1, \ldots, x_{50}$ and two EWA smooths: one for a high value of $\rho$, which downweights older values less, and one for a lower value of $\rho$, which--by downweighting older values more--tends to be less smooth but also closer to the recent $x$ values.

Here is anR implementation, via the function ewa, along with illustrations of its use to create the figures.

ewa <- function(x, z, rho=1) {
  if (missing(z)) {
    n <- length(x)
    if (n > 1) w <-  rho^((n-1):0) else w <- rep(1, n)
    z <- sum(x * w) / sum(w) # Compute EWA from scratch
  } else {
    z <- rho * z + (1-rho)*x # Update EWA from previous value `z`
  }
  return(z)
}

par(mfrow=c(1,3))

f <- Vectorize(function(x) ewa(1:10, rho=x))
curve(f(x), xlim=c(0,1), lwd=2, main="EWA(1:10)",
      xlab=expression(rho), ylab="EWA(1,2,...,10)")

i <- 1:50
j <- i^0.5
x <- sin(j/0.53/max(j) * 2 * pi) * exp(j/max(j)) + (i > 300) + rnorm(length(i), sd=0.25)
z <- 0
x.EWA.1 <- sapply(x, function(x) z <<- ewa(x, z, 0.95))
x.EWA.2 <- sapply(x, function(x) z <<- ewa(x, z, 0.75))
plot(i, x, xlab="i", ylab="Value", main=expression(paste("x and EWA(x), ", rho == 0.9)))
lines(i, x.EWA.1, pch=16, col="Red", lwd=2)

plot(i, x, xlab="i", ylab="Value", main=expression(paste("x and EWA(x), ", rho == 0.75)))
lines(i, x.EWA.2, pch=16, col="Blue", lwd=2)
par(mfrow=c(1,1))
Related Question