Conditions for Local Lipschitz Stability of I-Projection – Probability and Information Theory

entropyinformation-geometryit.information-theorypr.probability

The following post builds on this post; I'll begin by quoting the setting.


Background from Previous Question:

$\newcommand\SS{P}\newcommand\TT{Q}$Call a Gaussian probability measure $\SS$ on $\mathbb{R}^d$ isotropic if its covariance matrix is diagonal with non-vanishing determinant; i.e. $\Sigma_{i,i}>0$ for $i=1,\dots,d$ and $\Sigma_{i,j}=0$ whenever $i\neq j$ for each $i,j=1,\dots,d$.
Note: My definition of "isotropic" includes "the usual isotropic Gaussian measures," which, from my limited understanding, are assumed to have a covariance of the form $\sigma I_d$ for some $\sigma>0$.

Let $\mathcal{P}$ the set of isotropic Gaussian probability measures on $\mathbb{R}^d$ and let $\mathcal{Q}$ be the set of probability measures on $\mathbb{R}^d$ with Lebesgue density equipped with TV distance.

Consider the information projection (of I-projection) defined by
\begin{align}
\pi:Q &\rightarrow \mathcal{P}
\\
\pi(\TT) &:= \operatorname*{argmin}_{\SS\in \mathcal{P}}\, D(\SS\parallel\TT)
\end{align}


Definitions:

Let $\mathcal{B}(\mathbb{R}^d)$ denotes the Borel $\sigma$-algebra on the $d$-dimensional Euclidean space $\mathbb{R}^d$, $\mu$ denote the $d$-dimensional Lebesgue measure, and $\nu$ the standard Gaussian probability measure on $\mathbb{R}^d$. Let $\mathcal{W}_2$ denote the $2$-Wasserstein distance.

Fix a parameter $R>0$ and let $\mathcal{Q}_R$ denote the set of Borel probability measures $\mathbb{Q}$ on $\mathbb{R}^d$ which:

  • $\mathbb{Q}\ll \mu$,
  • $ \frac{d\mathbb{Q}}{d\mu} \in L^2(\mathcal{B}(\mathbb{R}^d),\mu)$,
  • $\mathcal{W}_2(\mathbb{Q},\nu)\le R$.

Is the map $\mathbb{Q}\mapsto \pi(\mathbb{Q})$ is locally-Lipschitz when considered on $\mathcal{Q}_R$?

Best Answer

The answer is still no -- for any, however small $R>0$.

Indeed, let $d=1$. Let $g$ be the pdf of $\nu$; that is, $g$ is the standard normal pdf. Let $Q_h$ be the probability measure with pdf $(1-p)g+pq_h$, where $q_h$ is as in the previous answer and $p\in(0,1)$ is a fixed number which is close enough to $0$ so that your condition involving $R$ hold. Then the arguments provided in the previous answer will hold here just as well:

Since $a$ is rather small, $Q_h$ is somewhat close to the mixture of (i) $\nu$ and (ii) the mixture of the rather narrow normal distributions $N(1,a^2)$ and $N(-1,a^2)$ with slightly unequal weights, $c_h\,(1+h)$ and $c_h\,(1-h)$ respectively. So, a minimizer $P_h$ of the Kullback–Leibler divergence $D(P\parallel Q_h)$ in $P$ should be sufficiently close to $(1-p)\nu+pN(1,a^2)$ or $(1-p)\nu+pN(-1,a^2)$ depending on whether the small perturbation $h$ is $>0$ or $<0$, respectively. Thus, an infinitesimally small change from, say, $h>0$ to $-h<0$ will result in quite a nonnegligible change from $P_h\approx (1-p)\nu+pN(1,a^2)$ to $P_{-h}\approx (1-p)\nu+pN(-1,a^2)$. (If $h=0$, then there will be two minimizers.)