Probability – Can Identical Expectation and Variance in Random Variables Differ in Higher Moments?

distributionsmathematical-statisticsmomentsprobabilityrandom variable

I was thinking about the meaning of location-scale family.
My understanding is that for every $X$ member of a location scale family with parameters $a$ location and $b$ scale, then the distribution of $Z =(X-a)/b$ does not depend of any parameters and it's the same for every $X$ belonging to that family.

So my question is could you provide an example where two random from the same distribution family are standardized but that does not results in a Random Variable with the same distribution?

Say $X$ and $Y$ come from the same distribution family (where with family I mean for example both Normal or both Gamma and so on ..).
Define:

$Z_1 = \dfrac{X-\mu}{\sigma}$

$Z_2 = \dfrac{Y-\mu}{\sigma}$

we know that both $Z_1$ and $Z_2$ have the same expectation and variance, $\mu_Z =0, \sigma^2_Z =1$.

But can they have different higher moments?

My attempt to answer this question is that if the distribution of $X$ and $Y$ depends on more than 2 parameters than it could be. And I am thinking about the generalized $t-student$ that has 3 parameters.

But if the number of parameters is $\le2$ and $X$ and $Y$ come from the same distribution family with the same expectation and variance, then does it mean that $Z_1$ and $Z_2$ has the same distribution (higher moments)?

Best Answer

There is apparently some confusion as to what a family of distributions is and how to count free parameters versus free plus fixed (assigned) parameters. Those questions are an aside that is unrelated to the intent of the OP, and of this answer. I do not use the word family herein because it is confusing. For example, a family according to one source is the result of varying the shape parameter. @whuber states that A "parameterization" of a family is a continuous map from a subset of ℝ$^n$, with its usual topology, into the space of distributions, whose image is that family. I will use the word form which covers both the intended usage of the word family and parameter identification and counting. For example the formula $x^2-2x+4$ has the form of a quadratic formula, i.e., $a_2x^2+a_1x+a_0$ and if $a_1=0$ the formula is still of quadratic form. However, when $a_2=0$ the formula is linear and the form is no longer complete enough to contain a quadratic shape term. Those who wish to use the word family in a proper statistical context are encouraged to contribute to that separate question.

Let us answer the question "Can they have different higher moments?". There are many such examples. We note in passing that the question appears to be about symmetric PDFs, which are the ones that tend to have location and scale in the simple bi-parameter case. The logic: Suppose there are two density functions with different shapes having two identical (location, scale) parameters. Then there is either a shape parameter that adjusts shape, or, the density functions have no common shape parameter and are thus density functions of no common form.

Here, is an example of how the shape parameter figures into it. The generalized error density function and here, is an answer that appears to have a freely selectable kurtosis.

enter image description here

By Skbkekas - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6057753

The PDF (A.K.A. "probability" density function, note that the word "probability" is superfluous) is $$\dfrac{\beta}{2\alpha\Gamma\Big(\dfrac{1}{\beta}\Big)} \; e^{-\Big(\dfrac{|x-\mu|}{\alpha}\Big)^\beta}$$

The mean and location is $\mu$, the scale is $\alpha$, and $\beta$ is the shape. Note that it is easier to present symmetric PDFs, because those PDFs often have location and scale as the simplest two parameter cases whereas asymmetric PDFs, like the gamma PDF, tend to have shape and scale as their simplest case parameters. Continuing with the error density function, the variance is $\dfrac{\alpha^2\Gamma\Big(\dfrac{3}{\beta}\Big)}{\Gamma\Big(\dfrac{1}{\beta}\Big)}$, the skewness is $0$, and the kurtosis is $\dfrac{\Gamma\Big(\dfrac{5}{\beta}\Big)\Gamma\Big(\dfrac{1}{\beta}\Big)}{\Gamma\Big(\dfrac{3}{\beta}\Big)^2}-3$. Thus, if we set the variance to be 1, then we assign the value of $\alpha$ from $\alpha ^2=\dfrac{\Gamma \left(\dfrac{1}{\beta }\right)}{\Gamma \left(\dfrac{3}{\beta }\right)}$ while varying $\beta>0$, so that the kurtosis is selectable in the range from $-0.601114$ to $\infty$.

That is, if we want to vary higher order moments, and if we want to maintain a mean of zero and a variance of 1, we need to vary the shape. This implies three parameters, which in general are 1) the mean or otherwise the appropriate measure of location, 2) the scale to adjust the variance or other measure of variability, and 3) the shape. IT TAKES at least THREE PARAMETERS TO DO IT.

Note that if we make the substitutions $\beta=2$, $\alpha=\sqrt{2}\sigma$ in the PDF above, we obtain $$\frac{e^{-\frac{(x-\mu )^2}{2 \sigma ^2}}}{\sqrt{2 \pi } \sigma }\;,$$

which is a normal distribution's density function. Thus, the generalized error density function is a generalization of the normal distribution's density function. There are many ways to generalize a normal distribution's density function. Another example, but with the normal distribution's density function only as a limiting value, and not with mid-range substitution values like the generalized error density function, is the Student's$-t$ 's density function. Using the Student's$-t$ density function, we would have a rather more restricted selection of kurtosis, and $\textit{df}\geq2$ is the shape parameter because the second moment does not exist for $\textit{df}<2$. Moreover, df is not actually limited to positive integer values, it is in general real $\geq1$. The Student's$-t$ only becomes normal in the limit as $\textit{df}\rightarrow\infty$, which is why I did not choose it as an example. It is neither a good example nor is it a counter example, and in this I disagree with @Xi'an and @whuber.

Let me explain this further. One can choose two of many arbitrary density functions of two parameters to have, as an example, a mean of zero and a variance of one. However, they will not all be of the same form. The question however, relates to density functions of the SAME form, not different forms. The claim has been made that which density functions have the same form is an arbitrary assignment as this is a matter of definition, and in that my opinion differs. I do not agree that this is arbitrary because one can either make a substitution to convert one density function to be another, or one cannot. In the first case, the density functions are similar, and if by substitution we can show that the density functions are not equivalent, then those density functions are of different form.

Thus, using the example of the Student's$-t$ PDF, the choices are to either consider it to be a generalization of a normal PDF, in which case a normal PDF has a permissible form for a Student's$-t$'s PDF, or not, in which case the Student's$-t$ 's PDF is of a different form from the normal PDF and thus is irrelevant to the question posed.

We can argue this many ways. My opinion is that a normal PDF is a sub-selected form of a Student's$-t$ 's PDF, but that a normal PDF is not a sub-selection of a gamma PDF even though a limiting value of a gamma PDF can be shown to be a normal PDF, and, my reason for this is that in the normal/Student'$-t$ case, the support is the same, but in the normal/gamma case the support is infinite versus semi-infinite, which is the required incompatibility.

Related Question