Assume an arbitrary random variable $X$ is zero mean and unit variance, and $Y\sim \mathcal{N}(0,1)$. I have a guess, but I don't know whether it is correct,
\begin{align}
E \left[ X^{2n} \right] \le E \left[ Y^{2n} \right]
\end{align}
where $n \ge 1$. I check some random variables, and find they always satisfy the above inequality. Is this a general law? If not, is there any counter example?
Many thanks for any idea!
Property of zero mean and unit variance random variable
expected valuemoment-generating-functionsprobabilityrandom variables
Related Solutions
Recall that the mean $\mathbb{E}(X)$ of a random variable $X$ is a weighted average of the possible values of $X$, weighted by the probability $P(X = x)$ of each outcome $x$ of $X$. Hence, the mean of a function $f(X)$ is a weighted average of the possible values of $f(X)$, weighted by the probability $P(f(X) = f(x))$. But the probability that $f(X)$ takes the value $f(x)$ is the same as the probability that $X$ takes the value $x$ (this is not exactly true, since $f$ might take the same value for multiple $x$, but for the purpose of taking a weighted average it doesn't matter since the effective weight on $f(x)$ will be the sum of the probabilities of each such $x$ either way). This allows us to use the following formula to find the mean of $f(X)$, with sums or integrals depending on whether $X$ is continuous or discrete: $$\mathbb{E}(f(X)) = \int f(x) \cdot P(X = x) dx$$ $$\mathbb{E}(f(X)) = \sum f(x) \cdot P(X = x).$$
Similarly, the variance is a weighted average of the squared difference between each value $x$ and the average value $\mathbb{E}(X)$, weighted by the probability $P(X=x)$. So we can similarly calculate the variance of $f(X)$ "by hand" once we have the mean, using: $$\text{Var}(f(X)) = \int \left(f(x) - \mathbb{E}(f(X)) \right)^2 \cdot P(X = x) dx$$ $$\text{Var}(f(X)) = \sum \left(f(x) - \mathbb{E}(f(X)) \right)^2 \cdot P(X = x).$$
In both cases, the sum or integral is taken over all possible values $x$ of your variable $X$.
Note that as per Henry's answer, naively plugging $f(x) = \log(x)$ into these formulas will get you in trouble if $X \sim \text{Poisson}(\lambda)$, as you'll have a diverging integral. For a Poisson distribution, which allows $X = 0$, taking the mean and variance of $log(X)$ is only sensible if you first exclude $X = 0$ from your domain somehow. So this answer is conditional on $X$ remaining in the domain of $f$.
This behaviour is an example of conditional convergence. A perhaps simpler example to start with is $$ \int_{-1}^1 \frac{\mathrm{d}x}{x} \text{.}$$ This integrand is going to infinity as it approaches $0$ form the right an negative infinity as it approaches zero from the left. If we successively approximate this integral as $$ \int_{-1}^{-M} \frac{\mathrm{d}x}{x} + \int_{M}^{1} \frac{\mathrm{d}x}{x} \text{,} $$ we have arranged to carefully cancel the positive and negative contributions. No matter what positive value we pick for $M$, these two integrals are finite and their values cancel. But this is not how we evaluate improper integrals. (An integral that only gave useful answers if you sneak up on infinities in just the right way is too treacherous to trust. Taking this very special way of sneaking up on the infinities gives the Cauchy principal value.) Instead, we require that the two integrals can squeeze in towards zero independently and that regardless of how they do so we get the same answer. Contrast the above with $$ \int_{-1}^{-M} \frac{\mathrm{d}x}{x} + \int_{M/2}^{1} \frac{\mathrm{d}x}{x} \text{,} $$ which is always positive (because we include more of the positive part than of the negative part) and diverges to infinity. We can arrange to diverge to negative infinity by moving the "${}/2$" to the other integral.
The same thing is happening in your integral for the mean, $$ \int_{-\infty}^{\infty} y f(y) \,\mathrm{d}y $$ because for very large positive $y$, $1/y - 1 \approx -1$, so this integrand is approximately $1/y$ and similarly for very large negative $y$ where the integrand is approximately $1/y$. If you restrict your integral to be symmetric around $1$, $\int_{1-M}^{1+M} \dots $, these two tails will cancel (not exactly, but the cancellation improves the farther away from $1$ you go). However, if you use $\int_{1-M}^{1+2M} \dots $, they won't cancel as well and the uncancelled part will grow asymptotically like $\log M$.
This slow growth of the discrepancy has a consequence for numerical experiments. You will have to take many samples at very large values of $y$ and very carefully sum them so that the contributions from large $y$s aren't swamped by rounding the values from small $y$s. Your plot shows the integrand for the mean, $y f(y)$ is about $3$ for $y=1$. It's about $10^{-23}$ for $y \approx 100$. This means to retain necessary precision just over the range $y \in [-100,100]$, you will have to numerically integrate with at least $23$ digits of precision (plus several guard digits). Even more numerically difficult, because of the logarithmic rate of divergence of the right-hand and left-hand integrals, you will have to integrate over stupendous ranges of $y$ for any not carefully balanced cancellation to accumulate measurably. For instance, \begin{align} \int_{-10}^{-9} y f(y) \,\mathrm{d}y + \int_{9}^{10} y f(y) \,\mathrm{d}y &\approx 1.84\dots \times 10^{-18} \\ \int_{-100}^{-9} y f(y) \,\mathrm{d}y + \int_{9}^{100} y f(y) \,\mathrm{d}y &\approx 3.13\dots \times 10^{-18} \text{.} \end{align} (I've ignored the contribution from $-9$ to $9$ in both integrals so that the result isn't swamped by the $18$ orders of magnitude larger contribution from that region.) These will always be positive because the contribution from the positive $y$s is closer to the peak at $y=1$ than is the contribution from the negative $y$s.
So, in your simulations, were you sampling over truly stupendous ranges of $y$s so that you might observe the divergence? And, while doing so, were you keeping truly stupendous precision so that the very slow divergence wouldn't be swamped by rounding errors in the contribution from $y$s near $1$? Doing both of these things is computationally very challenging. Once you have that ironed out, you'll need to take the limits to infinity in different ways to avoid calculating some sort of principal value and when you do this, you'll discover that the integrals do not converge and your estimates are not stable. For instance \begin{align} \int_{1- 2 \cdot 10}^{1-9} y f(y) \,\mathrm{d}y + \int_{1+9}^{1+1 \cdot 10} y f(y) \,\mathrm{d}y &\approx 6.644\dots \times 10^{-19} \\ \int_{1- 2 \cdot 100}^{1-9} y f(y) \,\mathrm{d}y + \int_{1+9}^{1+1 \cdot 100} y f(y) \,\mathrm{d}y &\approx 1.292\dots \times 10^{-18} \end{align} and each time we increment the power of $10$ used for the outermost endpoints, the discrepancy will approximately double. Once we reach $10^{53}$-ish, the discrepancy will finally be as large as the contribution from $y \in [1-9,1+9]$.
To sum up, numerically detecting the divergence of this integral will require some very careful programming and numerical analysis. (Alternatively, you can do what I did for the estiamtes above and use a big symbolic and numerics application like Maple or Mathematica.)
Best Answer
In general, a random variable with finite mean and variance does not even need to have a finite fourth moment. Even if we require a finite fourth moment your conjecture does not hold, as shown by this counterexample:
Let $Z$ follow a Student t-distribution with $\nu=5$. Then for $Y = \sqrt{\frac{3}{5}} Z$ we have $E[Y]=0$, $E[Y^2]=1$, and $E[Y^4]=9$, whereas $E[X^4]=3$ for $X \sim \mathcal{N}(0,1)$.
In other words, $Y$ is a distribution with positive excess kurtosis.