[Math] approximate a probability distribution by moment matching

approximation-theoryca.classical-analysis-and-odesfourier analysismeasure-theorypr.probability

Suppose we want to approximate a real-valued random variable $X$ by a discrete random variable $Z$ with finitely many atoms. Suppose all moments of $X$ is finite. We want to match the moments of $X$ up to the $m^{\rm th}$ order:

(1) $\mathbb{E}[X^k] = \mathbb{E}[Z^k]$ for $k = 1, \ldots m$.

Here is a positive result, which is a simple consequence of convex analysis (Caratheodory's theorem): there exists $Z$ with at most $m+1$ atoms such that (1) holds.

Here are my questions:
1) Is there a converse result about this? Say $X$ has an absolutely continuous distribution supported on $\mathbb{R}$ (e.g. Gaussian). When $m$ is large, given that $Z$ has only $m$ atoms, can we conclude that we cannot approximate all $2m$ moments of $X$ well, i.e., can we lower bound the error
$\max_{1 \leq k \leq 2m}|\mathbb{E}[X^k] – \mathbb{E}[Z^k]|$? My intuition is the following: for a Gaussian $X$, $\mathbb{E}[X^k]$ grows like $k^{\frac{k}{2}}$ superexponentially. When we find a $Z$ who matches all moments of $X$ up to $m$, it cannot catch up with higher-order moments $X$; if $Z$ matches all moments from $m+1$ up to $2m$, then its low-order moments will be quite different from $X$.

2) Is there an efficient algorithm to compute the location and weights of the approximating discrete distribution? Does there exist a table to record these for approximating common distribution (e.g. Gaussian) for each fixed $m$? It could be very handy…

3) I heard from folklore that when (1) holds, the total variation distance between their distributions can be upper bounded by, say, $e^{-m}$ or $1/m!$. Of course, this won't be true for a discrete $Z$. But let's say $X$ and $Z$ both has smooth and bounded density on $\mathbb{R}$. Could this be true? Now two characteristic functions matches at $0$ up to $m^{\rm th}$ derivatives. They should be pretty close?

Best Answer

For (1) and (2) just forget about probability and recall everything you ever learned about orthogonal polynomials and the Gauss quadrature formulae.

3) is false as stated: there are plenty of Schwartz functions orthogonal to all polynomials, so you can have all moments coincide and still have a large distance (in any sense). Something like that may be true but I cannot think of any good formulation right away.

Related Question