Suppose $\mu_n$ and $\mu$ are probability measures such that $\mu_n \to \mu$ in total variation. I'm curious to what extent we can say $$\int f \ \mathrm{d}\mu_n \to \int f \ \mathrm{d}\mu,$$ when $\mu_n \to \mu$ in total variation – recall that this means $$\lim_{n \to \infty} \sup_{A \in \mathcal F}|\mu_n (A) – \mu(A)| \to 0,$$ where $\mathcal F$ is the underlying $\sigma$-algebra and $f$ of course is a Borel function. I've looked around a bit (admittedly not too hard – I thumbed through the TOC and index of Billingsley's "Convergence of Probability Measures" and looked at the Wikipedia page) and it seems to me I ought to be able to say that this should happen under very mild requirements. Under weak convergence of probability measures, of course, $\int f \ \mathrm{d}\mu_n \to \int f \ \mathrm{d}\mu$ holds for all bounded, continuous functions, which competely characterizes this mode of convergence. Surely I can get away with more with total variation convergence – what I hope "getting away with more" entails is that I can widen the class of $f$ such that this happens. For example, since $\mu_n (A) \to \mu(A)$ for all $A$ rather than just those $A$ with boundary probability $0$ I would hope we could loosen the continuity assumption on $f$.
Convergence of Probability Measures in Total Variation
measure-theoryprobability theory
Related Solutions
I have taken both real analysis and probability theory. So this question has bugged me for a while. This is just my 2 cents.
- The weak* convergence point of view comes fromt the Riesz Representation Theorem: If X is a locally compact Hausdorff space, $M(X)$ is the space of complex Borel measures on X with the total variation norm, and C_0(X) is the space of functions that vanish at infinity on X. Then M(X) is isometrically isomorphic to the dual space of $C_0(X)$. Then the theory of normed vector spaces gives you weak* topology on M(X). So in Probability theory, if X is LCH, then convergence in distribution coincides with weak* convergence.
Reference: Chapter 7 of Folland Real Analysis: Modern Techniques and Their Applications.
- The weak convergence point of view comes directly from the weak topology generated by bounded continuous functions. If X is a topological space, and P(X) is the space of probability measure on X. Then $BC(X)$ (bounded continuous functions on $X$) generates a weak topology on P(X). Convergence in the weak topology is just convergence in distribution. So the name. Even in this generality, you have a portion of the Portmanteau Theorem. There are text books that require X to be metric space, then the theory expands. Many textbooks require X to be polish, then this weak topology on P(X) is metrizable. There is no explicit assumption of LCH, and there is no Riesz Representation Theorem to give you a weak* topology.
Reference: Wikipedia, other websites, google books, and online lecture notes.
Best, Xiang
I suspect that when you wrote $|\mu_n|(\Omega)\to|\mu|(\Omega)$ maybe that's not exactly what you meant. Replying to the question as stated:
You assume so little about $\mu$ that it really has nothing to do with the rest of what's going on; there's no relationship between $\lambda$ and $\mu$. One doesn't need to "construct" a counterexample, any almost random choice of $\mu_n$ and $\mu$ works.
Say $\mu_n=\delta_{1/n}$, a point mass at $1/n$. So $\mu_n\to\delta_0$. Let $\Omega=(-1,1)$ and let $\mu$ be any positive measure with $\mu(\Omega)=1$.
Or maybe to better illustrate how $\mu$ really has nothing to do with $\lambda$, let $\Omega=(2,3)$ and let $\mu$ be any measure with $\mu(\Omega)=0$. Or $\Omega=\emptyset$ and $\mu=0$.
Heh, let $\mu_n$ be any norm-bounded sequence, $\Omega=\emptyset$, $\mu=0$.
Edit: Regarding the edit made to the question, and the comment asking whether there's any significant difference: Well of course there's a huge difference, since in the new version we have a specific "sequence" of measures! In particular, for example, if I'm reading things correctly $\mu_\epsilon$ is supported in the annulus $A_\epsilon=\{1-\epsilon\le|x|\le1\}$.
If I have the picture right it seems clear that the gradient of $u_\epsilon$ is $\nabla u_\epsilon(x)=-\frac1\epsilon\frac x{|x|}$ or something like that in $A_\epsilon$, $0$ elsewhere. So it seems clear that $\mu_\epsilon\to\lambda$, where $d\lambda=-n(x)\,dH$ (where $n$ is the outward unit normal on the sphere and $H$ is surface area on the sphere), which certainly appears to be ac wrt $H$.
Of course that could be all wrong, it's just my first impression. But note that the various things I say "seem clear" seem clear to me based on my picture of what $\mu_\epsilon$ actually is, not because of any general principle analogous to what you ask about in the original version of the question!
As a general rule, if they assert P and you don't see why P holds you might be better off actually stating what they actually assert and asking why it holds, instead of sort of guessing that they seem to be saying that P follows from Q and asking whether Q implies P.
Second edit: Two things.
(i) A conjecture regarding the sort of "soft" or "abstract" argument the authors might have had in mind: It's easy tp see that $||\mu_\epsilon||$ is bounded. And $\mu_\epsilon$ has a certain sort of rotational symmetry (which I'm not going to try to define precisely; this isn't my argument after all); hence any weak limit $\lambda$ must have the same symmetry. It's clear that $\lambda$ must be supported on $S=\{|x|=1\}$, since the support of $\mu_\epsilon$ shrinks to $S$, and the only vector-valued measure on $S$ with that symmetry is $cn\,dH$.
That's pretty vague; I'm not going to try to make it more precise, since it's just my guess regarding sort of what the authors may have had in mind. But:
(ii) Why it seems clear to me that $\mu_\epsilon$ simply does converge to what I say it does:
First, in general if $u$ is a radial function, $$u(x)=\phi(|x|),$$then $$\nabla u(x)=\phi'(|x|)\frac x{|x|}.$$("Advanced calculus": $\nabla u$ is the directional derivative in the direction of greatest increase, which is to say in a direction orthogonal to the level sets of $u$...) Hence $\nabla u_\epsilon$ is what I say it is above.
Now assume $f\in C_c(\Bbb R^n)$ and integrate in polar coordinates:
$$\int_{\Bbb R^n}f(x)\nabla u_\epsilon(x)\,dx=\int_S\frac{-1}{\epsilon}\int_{1-\epsilon}^1f(r\xi)\xi r^{n-1}\,drdH(\xi).$$And since $f$ is continuous it's clear that $$\lim_{\epsilon\to0}\frac{-1}{\epsilon}\int_{1-\epsilon}^1f(r\xi)\xi r^{n-1}\,dr=-f(\xi)\xi,$$uniformly over $\xi\in S$.
("Polar coordinates": In general $$\int_{\Bbb R^n}g(x)\,dx=c_n\int_S\int_0^\infty g(r\xi)r^{n-1}\,drdH(\xi).$$Note that if $H$ is actual "surface area" on $S$, in particular not normalized to be a probability measure as is sometimes done in that formula, then $c_n=1$. See Folland Real Analysis or various other places.)
Best Answer
If $f$ is a bounded, measurable real function, and $\mu_n\to\mu$ in total variation, then $\int f \, d\mu_n\to \int f\, d\mu$. The reason is just that $f$ can be uniformly approximated by simple functions.
If $f$ is not bounded $\int f\, d\mu_n$ need not converge to $\int f\, d\mu$. For a counterexample, look at measures on the real line, let $\mu_n$ give $\{n\}$ probability $\frac1n$ and $\{0\}$ probability $1-\frac1n$, let $\mu$ give $\{0\}$ probability $1$, and let $f$ be the identity. Then $\mu_n\to\mu$ in total variation but, for all $n$, $\int f\, d\mu_n=1\ne \int f \,d\mu=0$.