Show that $\chi^2(\nu κ,μ)\le c\chi^2(\nu,μ)$ for all probability measures $\nu$ implies $\text{Var}_μ[κf]\le c\text{Var}_μ[f]$ for all $f\in L^2(μ)$

chi squaredfunctional-inequalitiesmarkov-processmeasure-theoryprobability theory

Let $(E,\mathcal E)$ be a measurable space and $$\chi^2(\nu,\mu):=\begin{cases}\displaystyle\mu\left|\frac{{\rm d}\nu}{{\rm d}\mu}-1\right|^2=\mu\left|\frac{{\rm d}\nu}{{\rm d}\mu}\right|^2-1&\text{, if }\nu\ll\mu\\\infty&\text{, otherwise}\end{cases}$$ denote the $\chi^2$-distance of $\mu$ and $\nu$ for probability measures $\mu$ and $\nu$ on $(E,\mathcal E)$.

Let $\kappa$ be a Markov kernel on $(E,\mathcal E)$ and $\mu$ be a probability measure on $(E,\mathcal E)$ invariant with respect to $\kappa$ (i.e.$^1$ $\mu\kappa=\mu$). Assume there is a $c\ge0$ with $$\chi^2(\nu\kappa,\mu)\le c\chi^2(\nu,\mu)\tag1$$ for all probability measures $\nu$ on $(E,\mathcal E)$. How can we conclude that $$\operatorname{Var}_\mu\left[\kappa f\right]\le c\operatorname{Var}_\mu\left[f\right]\;\;\;\text{for all }f\in L^2(\mu)\tag2?$$

Let $f\in L^2(\mu)$ and assume, for the moment, that $^2$ $\mu f=0$. Then $$\operatorname{Var}_\mu\left[\kappa f\right]=\left\|\kappa f\right\|_{L^2(\mu)}^2\tag3$$ and $$\operatorname{Var}_\mu\left[f\right]=\left\|f\right\|_{L^2(\mu)}.\tag4$$ As in any Hilbert space, $$\left\|h\right\|_{L^2(\mu)}=\sup_{\substack{g\in L^2(\mu)\\\left\|g\right\|_{L^2(\mu)}\le1}}|\langle h,g\rangle_{L^2(\mu)}|\;\;\;\text{for all }h\in L^2(\mu)\tag5$$ and hence it would be sufficient to show $$|\langle\kappa f,g\rangle_{L^2(\mu)}|\le\sqrt c\left\|f\right\|_{L^2(\mu)}\left\|g\right\|_{L^2(\mu)}\;\;\;\text{for all }g\in L^2(\mu)\tag6.$$


In order to show $(6)$, let's consider $g\in L^2(\mu)$ with $g\ge0$ and $\mu g=1$. Then $$\nu:=g\mu$$ (the measure with density $g$ wrt $\mu$) is a probability measure on $(E,\mathcal E)$. Assume $\nu\kappa\ll\mu$ and let $$h:=\frac{{\rm d}\nu\kappa}{{\rm d}\mu}.$$ By $(1)$, $$\mu|h-1|^2=\chi^2(\nu\kappa,\mu)\le c\chi^2(\nu,\mu)=c\left(\left\|g\right\|_{L^2(\mu)}^2-1\right)\le c\left\|g\right\|_{L^2(\mu)}\tag7.$$ Since $\mu f=0$, $$\langle\kappa f,g\rangle_{L^2(\mu)}=\nu(\kappa f)=(\nu\kappa)f-\mu f=\mu\left((h-1)f\right)\tag8$$ and hence $$|\langle\kappa f,g\rangle_{L^2(\mu)}|\le\mu|(h-1)f|\le\sqrt{\mu|h-1|^2}\left\|f\right\|_{L^2(\mu)}\le\sqrt c\left\|g\right\|_{L^2(\mu)}\left\|f\right\|_{L^2(\mu)}\tag9$$ by Hölder's inequality.

Clearly, the above consideration extends easily to all $g\in L^2(\mu)\setminus\left\{0\right\}$ with $g\ge0$ by considering $\frac g{\mu g}$.

However, it's not clear to me what we can do if $\nu\kappa\not\ll\mu$ (note that this case cannot occur, when $\mu$ is reversible wrt $\kappa$, i.e. $$\int\mu({\rm d}x)\int\kappa(x,{\rm d}y)f(x,y)=\int\mu({\rm d}y)\int\kappa(y,{\rm d}x)f(x,y)\tag{10}$$ for all bounded $\mathcal E\otimes\mathcal E$-measurable $f:E\times E\to\mathbb R$, since then $\nu\kappa=(\kappa g)\mu$). So, do we need to assume reversibility?

Moreover, it's not clear to me how we can extend the considerations above to all $g\in L^2(\mu)$ (which are not necessarily nonnegative) which would finally yield $(6)$.


$^1$ $\mu\kappa$ denotes the composition of $\mu$ and $\kappa$.

$^2$ As usual, $\kappa f:=\int\kappa(\;\cdot\;,{\rm d}y)f(y)$ and, analogously, $\mu f:=\int f\:{\rm d}\mu$.

Best Answer

Instead of patching the holes you mentioned in your question, I found it simplest to use a fresh idea: applying the variational characterization of $\chi^2$ directly, to show that inequality $(1)$ yields a more general variant of $(2)$, from which the desired result will quickly follow.

Lemma. Let $\mu$ be a probability measure and let $\kappa$ be a kernel satisfying $\mu\kappa = \mu$. Let $c\geq 0$ be a constant such that $\chi^2(\nu\kappa,\mu)\leq c\chi^2(\nu,\mu)$ for all probability measures $\nu$. Then $$ \bigl[\mu(g\cdot\kappa f) - \mu g \cdot \mu f\bigr]^2 \leq c \textrm{Var}_{\mu}(g) \textrm{Var}_{\mu}(f)\text{ for all }f,g\in L^2(\mu). $$

Proof. First suppose that $f\geq 0$ and $\mu(f)=1$, so that $\nu=f\mu$ is a probability measure. Then $\chi^2(\nu,\mu)=\textrm{Var}_{\mu}(f)$. Rewriting $\chi^2(\nu\kappa,\mu)$ using the variational characterization of $\chi^2$ presented in Lemma 7.3 (ii) of the notes you linked to, we thus obtain that $$ \bigl[\mu(f\cdot \kappa g)-\mu(g)\bigr]^2\leq c\textrm{Var}_{\mu}(f)u(g^2), $$ for all bounded $g\in L^2(\mu)$. Replacing $g$ with $g-\mu(g)$ and simplifying leads to $$ \bigl[\mu(f\cdot \kappa g)-\mu(f)\cdot\mu(g)\bigr]^2\leq c\textrm{Var}_{\mu}(f)\textrm{Var}_{\mu}(g), $$ again for all bounded $g\in L^2(\mu)$. Recall our assumptions that $f\geq 0$ and $\mu(f)=1$ that were made in order to derive this inequality. Now observe that the inequality remains unchanged after scaling $f$ by a constant, so we may dispense with the assumption $\mu(f)=1$. Furthermore, the left side is unchanged after adding a constant function to $f$: indeed, this follows since $\mu(c\cdot \kappa g)=c\mu(g)$ by the hypothesis $\mu\kappa=\mu$. Thus, the condition $f\geq 0$ can be replacing with the weaker condition that $f$ is bounded from below. Finally, an approximation argument allows us to remove the boundedness assumptions from both $f$ and $g$, yielding the claim. $\square$

Taking $g=kf$ in the lemma yields $$ \bigl[\textrm{Var}_{\mu}(\kappa f) \bigr]^2 \leq c \textrm{Var}_{\mu}(\kappa f) \textrm{Var}_{\mu}(f). $$ Thus when $\textrm{Var}_{\mu}(kf)>0$ we can divide through to obtain $$ \textrm{Var}_{\mu}(\kappa f) \leq c \textrm{Var}_{\mu}(f) $$ as desired, and in the remaining case $\textrm{Var}_{\mu}(\kappa f)=0$ the inequality holds trivially.


Below this line are earlier attempts at answering the question, kept so that anyone interested can read through the evolution of this answer. The following was an attempt to complete the solution sketched in the question, but a hole was uncovered in the final step due to possibly non-negative cross terms appearing in the square. Further below this answer is an earlier solution, which misapplied the $\chi^2$ hypothesis using an incorrect identity $(f\mu)\kappa=(\kappa f)\mu$, effectively solving a different problem than the one which was asked.

You have shown above that the result follows from $$ \bigl|\langle \kappa f,g\rangle_{L^2(\mu)}\bigr|\leq \sqrt{c}\|f\|_{L^2(\mu)}\|g\|_{L^2(\mu)}\quad \text{for all }g\in L^2(\mu),\qquad (6) $$ which you proved for all $g\in L^2(\mu)$ such that $g\geq 0$ and $\mu(g)=1$ and for which $(g\mu)\kappa\ll \mu$. In fact, the final condition is unnecessary.

Claim. For all $g\in L^2(\mu)$ such that $g\geq 0$ and $\mu(g)=1$, we have that $(g\mu)\kappa\ll\mu$.

Proof. By $(1)$ we have that $\chi^2\bigl[(g\mu)\kappa,\mu\bigr]\leq c\chi^2(g\mu,\mu)<\infty$. Thus by the definition of $\chi^2$ you have given, it follows that $(g\mu)\kappa\ll \mu$. $\square$

Thus, it remains to show that if $(6)$ holds for all $g\in L^2(\mu)$ such that $g\geq 0$ and $\mu(g)=1$, then it holds for all $g\in L^2(\mu)$. You already pointed out that a simple scaling argument allows us to dispense with the condition $\mu(g)=1$.

To complete the final step, we take an arbitrary $g\in L^2(\mu)$ and decompose it into positive and negative parts as $g=g_+-g_-$ where $g_+=\max(g,0)$ and $g_-=\max(-g,0)$. Note that $g_{\pm}\geq 0$ and $g_+\cdot g_-=0$. Thus, $g^2=g_+^2+g_-^2$ and therefore $$ \mu(g^2)=\mu(g_+^2)+\mu(g_-^2). $$ Observe that $$ \bigl|\langle \kappa f,g\rangle_{L^2(\mu)}\bigr|^2= \bigl|\langle \kappa f,g_+\rangle_{L^2(\mu)}\bigr|^2+\bigl|\langle \kappa f,g_-\rangle_{L^2(\mu)}\bigr|^2-2\langle \kappa f,g_+\rangle_{L^2(\mu)}\langle \kappa f,g_-\rangle_{L^2(\mu)}, $$ so by the non-negative case of $(6)$ (which we have already established) $$ \bigl|\langle \kappa f,g\rangle_{L^2(\mu)}\bigr|^2\leq c\|f\|^2_{L^2(\mu)}\bigl(\|g_+\|^2_{L^2(\mu)}+\|g_-\|^2_{L^2(\mu)}\bigr)=c\|f\|^2_{L^2(\mu)}\|g\|^2_{L^2(\mu)}, $$ as desired.


Initially I posted the following long-winded answer to the first question asked above. However, after some reflection I realized that a simpler and clearer answer was to address the two sticking points in your approach, as I have done above. Keep reading to see my initial long-winded answer...


Before starting the argument, let me note a basic but important property of any Markov transition kernel $\kappa$. Since $\kappa(x,\cdot)$ is a probability measure for all $x$, we have that $\kappa\cdot 1 = 1$, or more generally for any constant function $c\in L^2(\mu)$ we have that $\kappa\cdot c=c$.


To show that $$ \textrm{Var}_{\mu}[\kappa f]\leq c \textrm{Var}_{\mu}[f]\quad \text{for all }f\in L^2(\mu),\qquad (\star) $$ we start by using the following identity to simplify the expression.

Claim. For all $\mu,\kappa,$ and $f$ as above, let $g=f-\mu(f)\in L^2(\mu)$. Then $$\textrm{Var}_{\mu}[\kappa f]=\mu\bigl[(\kappa g)^2\bigr].$$

Proof. By definition, $\textrm{Var}_{\mu}(f)=\mu[(f-\mu f)^2]$. Thus $$ \textrm{Var}_{\mu}[\kappa f]=\mu\bigl[(\kappa f-\mu\kappa f)^2\bigr]. $$ By invariance of $\mu$ under $\kappa,$ we have that $\mu(\kappa f)=\mu f$. Thus, $$ \textrm{Var}_{\mu}[\kappa f]=\mu\bigl[(\kappa f-\mu f)^2\bigr]=\mu\bigl[(\kappa g)^2\bigr],$$ where in the last equality we used that $\kappa [\mu(f)] = \mu (f)$ since $\kappa$ fixes constant functions. $\square $

Applying the claim to the left and right sides of $(\star)$ (using the Markov kernels $\kappa$ and $\textrm{id}$ respectively), we see that $(\star)$ is a consequence of the following inequality: $$ \mu\bigl[(\kappa g)^2\bigr]\leq c \mu\bigl[g^2\bigr]\quad \text{for all }g\in L^2(\mu)\text{ satisfying }\mu(g)=0.\qquad (\star\star) $$

Now let's work from the other direction, starting by expressing the $\chi^2$ condition in terms of the quantities we have been working with above.

Claim. Let $f\in L^2(\mu)$ be a function such that $f\geq 0$ and $\mu(f)=1$. Then $f\mu$ is a probability measure satisfying $$ \chi^2(f\mu,\mu)=\textrm{Var}_{\mu}(f). $$ Proof. Let $\nu=f\mu$. Since $f\geq 0$ we have that $\nu$ is an unsigned measure, and since $\nu(E)=\mu(f)=1$, we have that $\nu$ is in fact a probability measure. By definition of the Radon-Nikodym derivative, we further have that $$ \nu\ll \mu\quad\text{ and }\quad\frac{d\nu}{d\mu}=f. $$ Therefore $$ \chi^2(\nu,\mu):=\mu\bigl[(f-1)^2\bigr]=\mu\bigl[(f-\mu f)^2\bigr]=\textrm{Var}_{\mu}(f), $$ as desired. $\square $

Taking $f\in L^2(\mu)$ with $f\geq 0$ and $\mu(f)=1$, we observe that the function $\kappa f$ satisfies these same conditions as well. Indeed, $\kappa f\geq 0$ since $(\kappa f)(x)$ is the result of integrating $f$ against the measure $\kappa(x,\cdot)$ and is thus non-negative. Moreover, $\mu(\kappa f)=1$ since $\kappa$ preserves $\mu$.

Thus, the claim applies to both $f$ and $\kappa f$, yielding that $$ \chi^2(f\mu,\mu)=\textrm{Var}_{\mu}(f)\quad\text{ and }\quad \chi^2\bigl[(\kappa f)\mu,\mu\bigr]=\textrm{Var}_{\mu}(\kappa f). $$

The hypothesis states that $$ \chi^2(\nu\kappa,\mu)\leq c\chi^2(\nu,\mu), $$ which we will apply to $\nu=f\mu$. Since $\nu\kappa=(\kappa f)\mu$, when we substitute the identities in the previous display into the left and right sides of the given hypothesis we deduce that $$ \textrm{Var}_{\mu}(\kappa f)\leq c \textrm{Var}_{\mu}(f),\qquad \text{for all }f\in L^2(\mu),f\geq 0,\mu(f)=1. $$ Both sides of this inequality are homogeneous in $f$: replacing $f$ by $Cf$ for any constant $C$ scales both the left and right sides by $C^2$. Thus, we obtain the more general inequality $$ \textrm{Var}_{\mu}(\kappa f)\leq c \textrm{Var}_{\mu}(f),\qquad \text{for all }f\in L^2(\mu),f\geq 0.\qquad (\star\star\star) $$

To be clear: Equation $(\star\star\star)$ has been deduced from the hypothesis, whereas equation $(\star\star)$ is what needs to be established in order to obtain the desired result.

Thus the essence of your question boils down (after these preliminary rewritings) to the implication $(\star\star\star)\implies (\star\star)$. We now prove this using a simple truncation argument.

Truncation argument. Fix $g\in L^2(\mu)$ with $\mu(g)=0$. Let $g_n=\max(g,-n)$, which converges in a monotone fashion to the function $g$ as $n\to\infty$. Applying $(\star\star\star)$ to $f_n=g_n+n$, we obtain that $$ \textrm{Var}_{\mu}(\kappa (g_n+n))\leq c \textrm{Var}_{\mu}(g_n+n)=c \textrm{Var}_{\mu}(g_n). $$ Since $\kappa$ fixes constant functions, $\kappa (g_n+n)=\kappa g_n + n$, so the previous display yields $$ \textrm{Var}_{\mu}(\kappa g_n)\leq c\textrm{Var}_{\mu}(g_n). $$ Taking the limit as $n\to\infty$ on both sides and applying the monotone convergence theorem yields $$ \textrm{Var}_{\mu}(\kappa g)\leq c\textrm{Var}_{\mu}(g), $$ giving us the claim $(\star\star)$ since $\mu(g)=\mu(\kappa g)=0$.

Related Question