Limiting distributions (attractors) associated with the discrete difference operator – application to error detection

combinatoricsdiscrete mathematicsdynamical systemsprobability distributionssequences-and-series

Let $Z = (Z_1,Z_2,\cdots)$ be a sequence of real numbers, either deterministic or with the $Z_i$'s being i.i.d. random variables. As usual, the successive discrete difference sequences are recursively defined as follows:

  • $D_{0} = Z$
  • If $n\geq 0$, then $D_{n+1} = (D_{n+1,1}, D_{n+1,2}, D_{n+1,3},\cdots)$ with
    $D_{n+1,k}=D_{n,k}-D_{n,k+1}$.

For a given $n$, all $D_{n,i}$'s have the same distribution. If the $Z_i$'s are i.i.d, then $\mbox{E}(D_{n,i}) = 0$ if $n>0$ and $\mbox{Var}(D_{n,i})=\mbox{Var}(Z_1)\cdot (2n)!/(n!)^2$. This is based on the fact that
$$D_{n,i} =\sum_{k=0}^n (-1)^k \frac{n!}{k!(n-k)!} Z_{k+i} .$$

Thus $$\mbox{Var}(D_{n,i})=\mbox{Var}(Z_1)\cdot\sum_{k=0}^n \Big(\frac{n!}{k!(n-k)!}\Big)^2 =\mbox{Var}(Z_1)\cdot \frac{(2n)!}{(n!)^2}.$$

We define the normalized vector $C_n$ as $C_n = D_n/\sqrt{\mbox{Var}(D_{n,1})}$.

1. Question

Find at least one attractor distribution (also called limit distribution) $F$ such as when $n\rightarrow\infty$, we have $P(C_{n,i} <z) \rightarrow F(z)$ as $n\rightarrow\infty$. For a given $n$, all $C_{n,i}$'s have the same distribution. Of course $F$ may depend on $Z$. Note that the she support domain for $F$ is infinite, see section 4.

2. Experimentation

I played with various deterministic sequences that look just like random variables, namely $Z_k = \{b^k \log 2\}$ where $b>1$, and the brackets represent the fractional part function. Keep in mind that in this context, the $Z$ sequence is auto-correlated: the correlation between two successive values is equal to $1/b$ if $b$ is an integer, and lag-$m$ autocorrelation is equal to $1/b^m$ (so the autocorrelations are decaying exponentially fast).

Below is the limit distribution obtained if $b=1.1$. This is a non-standard case. The chart below features its empirical percentile distribution:

enter image description here

And now the typical limit distribution (actually the percentile distribution) for a standard case, $b=3$:

enter image description here

It would be interesting to see what we obtain if $Z_k$ is uniform on $[0, 1]$, or normal, assuming the $Z_i$'s are i.i.d. this time.

3. Characteristic function

It is easy to verify that the characteristic function of $C_{n,1}$ is given by
$$\mbox{E}[\exp(itC_{n,1})] = \prod_{k=0}^n\mbox{E}\Big[\exp\Big(it (-1)^n \cdot \frac{n!\sqrt{(2n)!}}{k!(n-k)!n!} \cdot Z_k\Big)\Big]$$
where the $Z_k$'s are i.i.d. and have same distribution as $Z_1$. Thus is $Z_1$ has a stable distribution, then $C_{n,1}$ will have a distribution from the same family. This is the case if $Z_1$ has a normal distribution.

4. The support domain of the limiting distribution is infinite

In order to prove this, let's focus on the maximum value for $D_{n,1}$, assuming $Z_1$ is Bernouilli of parameter $1/2$. The maximum is equal to $2^{n-1}$. Also $C_{n,1}=D_{n,1}/\sqrt{\mbox{Var}(D_{n,1})} = (n!/\sqrt{(2n)!})\cdot D_{n,1}$. Using the Stirling approximation for the factorials, the maximum for $C_{n,1}$ is thus of the order $n^{1/4}$ and tends to infinity as $n$ tends to infinity.

The same logic can be used for the minimum. And to get a general proof, the final step is to consider an arbitrary distribution for $Z_1$, rather than the Bernouilli. Some restrictions may apply on the distribution of $Z_1$ for this result to be valid in the general case. I tried with a uniform distribution on $[0, 1]$ for $Z_1$, and at this stage it is still unclear to me if the support domain is infinite. In this case, it looks like $\max D_{n,1} \propto \alpha^n$ with $\alpha$ very close to $2$, but possibly smaller. We need $\alpha = 2$ for the support domain of $C_{\infty,1}$ to be infinite.

5. Potential application

Successive applications of the discrete difference operator are very useful to identify errors or outliers in a sequence of numbers, as these errors propagate exponentially fast across $D_n$, as $n$ increases. The theory discussed here could lead to a statistical test to identify these errors. Specifically, the statistic of the test could be a combination of the following ratios:

$$T_n =\frac{\mbox{Var}(D_{n+1,i})}{\mbox{Var}(D_{n,i})} = \frac{2(2n+1)}{n+1}, n = 0, 1, 2,\cdots.$$

Any departure from the expected value of $T_n$, for $n=0, 1$ or $2$, could suggest that there is some unexpected pattern in your data, or possibly that your data points are auto-correlated. Indeed, I noticed this effect when comparing i.i.d $Z_i$'s with some that are auto-correlated, as in the examples pictured in this discussion (though the discrepancy tends to disappear as $n\rightarrow \infty$).

Finally, keep in mind that as $n\rightarrow\infty$, the lag-$m$ auto-correlation computed on the $C_{\infty}$ sequence is strong for $m<6$. The table below gives a rough approximation.

enter image description here

6. Conclusions

In many cases where $Z_1$ has a continuous distribution, it looks like the limiting distribution will be normal. If $Z_1$ has a discrete distribution, we have several possibilities, see below. Note that the references in the list below are not related to the difference operator, but rather to some other systems that produce the same kinds of limiting distributions. So, the various cases below apply to a large class of chaotic systems: it is a general summary for all these systems.

  • The limiting distribution can be normal or some other regular smooth
    distribution, either well known or not
  • The limiting distribution can look very smooth yet be differentiable nowhere, and yet very close to normal or related (see here, here, and here)
  • The limiting distribution can look very un-smooth
  • The limiting distribution can be very peculiar, as in the fist chart in this article, yet computable
  • The limiting distribution can be a piecewise linear mix of uniform distributions or some other weird mix
  • The limiting distribution can be very wild, totally chaotic as in the picture below (as if it was a Fourier series with a bunch of
    missing terms) not unlike the Weierstrass function.

In some of the chaotic cases, the distribution has a fractal behavior (see here.)

enter image description here

The above figure shows the wild percentile distribution for $C_{\infty, 1}$ if $Z_k = \{ (e/2)^{k}\cdot\log 2\}$, corresponding to the last case in the above bullet list (the brackets represent the fractional part function).

This taxonomy applies to many of the systems I have studied recently. My next step is to create a taxonomy of all the potential attractors (the limiting distributions) for this system, and other systems.

Below is one last example where the $Z_i$'s are very heavily auto-correlated, with very long (infinite) range strong auto-correlation. It corresponds to $Z_k = \{ne\}$.

enter image description here

Best Answer

As you have written $$ E\exp(itC_n) = \prod_{k=0}^n \varphi_Z\left((-1)^n\frac{(2n)!}{k!(n-k)!n!}t\right). $$ For $Z\sim N(0,1)$ RHS becomes $$ \exp\left(\frac{t^2}2\sum_{k=0}^n \left(\frac{(2n)!}{k!(n-k)!n!}\right)^2 \right) \longrightarrow \exp\left(\frac{t^2}2\lim_n\sum_{k=0}^n \left(\frac{(2n)!}{k!(n-k)!n!}\right)^2\right) \sim N\left(0, \lim_n\sum_{k=0}^n \left(\frac{(2n)!}{k!(n-k)!n!}\right)^2\right) $$ provided the limit exists.

Related Question