Why variance is only defined for square integrable random variables

probabilityprobability theoryrandom variablesvariance

I'm learning probability theory alone so I'm having basic questions.

Every book I read defined variance only for square integrable random variable. So my question is: why they exclude the case when a random variable is only integrable?

If $X$ is an integrable random variable, then $(X-\mathbb{E}[X])^2$ is a non-negative measurable function and, therefore, it's Lebesgue integrable. Hence, we could define the variance of an integrable random variable $X$ as $\text{var(X)}:=\mathbb{E}[(X-\mathbb{E}[X])^2]$. In this case we still have properties such as

  • $\text{Var}(aX+b)=a^2\text{Var}(X)$
  • $\mathbb{P}(|X-\mathbb{E}[X]|\geq t)\leq \frac{\text{Var}(X)}{t^2}$ for all $t\in(0,\infty)$ using the "generalized" Chebyshev's Inequality (see the section "Measure-theoretic statement" in Wikipedia's article about the Chebyshev's Inequality).

Best Answer

For any random variable $X$, you can consider the extended definition \begin{align} \text{Var}(X):=\inf_{s\in\Bbb{R}}\Bbb{E}[(X-s)^2]. \end{align} This is a well-defined number in $[0,\infty]$, and it measures how far away (in the $L^2$ sense) your random variable is from the "best possible constant" (due to the infimum). If this quantity is $\infty$, then it means in particular (taking $s=0$) that $X\notin L^2$. If this quantity is finite, then there is an $s_0\in\Bbb{R}$ such that $X-s_0\in L^2$, so by virtue of constant functions $s_0$ being in $L^2$ (since we're on a probability space), it follows that $X\in L^2$, and hence $X\in L^1$, so you can expand $\Bbb{E}[(X-s)^2]=\Bbb{E}[X^2]-2s\Bbb{E}(X)+s^2$, and this quadratic can be minimized as usual to find that the best value of $s$ is indeed the mean $\Bbb{E}(X)$ (again, this makes sense since $X-s_0\in L^2\implies X\in L^2\implies X\in L^1$).

With this definition (and the standard measure theory convention that $0\cdot \infty=0$) you can indeed prove the the first equality of your post by a simple case work ($X\in L^2$ vs $X\notin L^2$, and in the latter case consider separately $a=0$ vs $a\neq 0$ if you really want to convince yourself). The inequality you write down is trivially true if $X\notin L^2$, because then the RHS of the estimate is $\infty$.

But now comes the obvious fact, also mentioned in the comments, that just because something is true, it doesn't mean it is useful. As a result, we restrict to $X\in L^2$ so that we make non-trivial statements involving the variance (but anyway, I think some texts define straight away $\text{Var}(X)=\Bbb{E}[X^2]-\Bbb{E}[X]^2$, this would only make sense if the stuff on the right makes sense... and if for instance $\Bbb{E}[X^2]=\Bbb{E}[X]^2=\infty$, the subtraction wouldn't even be well-defined. As such they take the simpler route of just assuming $X\in L^2$ from the beginning so they deal with arithmetic in $\Bbb{R}$ only. But with the "more general" definition of variance above, it makes sense in $[0,\infty]$ for all random variables, so from a logical perspective it has that advantage, but practically both offer the same amount of useful information).

Related Question