Solved – Why the first moment is standardized before computing higher moments, but higher moments are not

moments

For the second and higher moments, the central moments (moments about the mean, with c being the mean) are usually used rather than the moments about zero, because they provide clearer information about the distribution's shape.

Could someone explain/convince me why this is true? Why is there a discrepancy?
This has always bugged me and I have never seen a good explanation for it — I just don't quite understand why/how standardization provides "clear" information in one case but not in another.

For example:

To compute the skewness, why not standardize both the mean and the variance?
To compute the kurtosis, why not standardize the mean, the variance, and the skewness?
…
To compute the n^th moment, why not first standardize all the m^th moments for m < n?
If standardization is useful then why only do this for m = 1?

Best Answer

Since the question was updated, I update my answer:

The first part (To compute the skewness, why not standardize both the mean and the variance?) is easy: That is precisely how it's done! See the definitions of skewness and kurtosis in wiki.

The second part is both easy and hard. On one hand we could say that it is impossible to normalize random variable to satisfy three moment conditions, as linear transformation $X \to aX + b$ allows only for two. But on the other hand, why should we limit ourselves to linear transformations? Sure, shift and scale are by far the most prominent (maybe because they are sufficient most of the time, say for limit theorems), but what about higher order polynomials or taking logs, or convolving with itself? In fact, isn't it what Box-Cox transform is all about -- removing skew?

But in the case of more complicated transformations, I think, the context and the transformation itself becomes important, so maybe that is why there are no more "moments with names". That does not mean that r.v.s are not transformed and that the moments are not calculated, on the contrary. You just chose your transformation, calculate what you need and move on.

The old answer about why centralized moments represent shape better than raw:

The keyword is shape. As whuber suggested, by shape we want consider the properties of the distribution that are invariant to translation and scaling. That is, when you consider variable $X + c$ instead of $X$, you get the same distribution function (just shifted to the right or left), so we would like to say that its shape stayed the same.

The raw moments do change when you translate the variable, so they reflect not only the shape, but also a location. In fact, you can take any random variable, and shift it $X \to X + c$ appropriately to get any value for its, say, raw third moment.

The same observation holds for all odd moments and to lesser extent for even moments (they are bounded from below and lower bound does depend on shape).

The centralized moment, on the other hand, does not change when you translate the variable, so that's why they are more descriptive of the shape. For example, if your even centralized moment is large, you known that random variable has some mass not too close to mean. Or if your odd moment is zero, you known that your random variable has some symmetry around mean.

The same argument extends to scale, which is transformation $X\to cX$. The usual normalization in this case is division by standard deviation, and the corresponding moments are called normalized moments, at least by wikipedia.

Related Solutions

Solved – Calculation of Higher-Order Cross-moments

You are in essence looking for a multivariate measure of skew and kurtosis. There are many. I would start with the most establish ones, which are the multivariate skew and kurtosis measures of Mardia 1977 [0].

It seems to me you are more asking for an implementation than about the measures themselves. I don t know of any matlab implementation, but the R code below (from the R library psych) should be fairly easy to translate in matlab:

mardia <- function(x, na.rm=TRUE, plot=TRUE) {
  cl <- match.call()
  x <- as.matrix(x)     # in case it was a dataframe
  if(na.rm) x <- na.omit(x)
  n <- dim(x)[1]
  p <- dim(x)[2]
  x <- scale(x,scale=FALSE)  # zero center
  S <- cov(x)
  S.inv <- solve(S)
  D <- x %*% S.inv %*% t(x)
  b1p <- sum(D^3)/n^2
  b2p <- tr(D^2)/n 
  chi.df <- p*(p+1)*(p+2)/6
  k <- (p+1)*(n+1)*(n+3)/(n*((n+1)*(p+1) -6))

  small.skew <- n*k*b1p/6
  M.skew <- n*b1p/6
  M.kurt <- (b2p - p * (p+2))*sqrt(n/(8*p*(p+2)))
  p.skew <- 1-pchisq(M.skew,chi.df)
  p.small <- 1 - pchisq(small.skew,chi.df)
  p.kurt <- 2*(1- pnorm(abs(M.kurt)))
  d = sqrt(diag(D))
  if(plot) {qqnorm(d)
            qqline(d)}
  results <- list(n.obs=n, n.var=p, b1p=b1p,b2p= b2p, skew=M.skew, 
                  small.skew=small.skew, p.skew=p.skew, p.small=p.small, 
                  kurtosis=M.kurt, p.kurt=p.kurt, d=d, Call=cl)
  class(results) <- c("psych", "mardia")
  return(results)
}

[0] K.V. Mardia (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3):pp. 519-30, 1970.

Solved – Does a scaling and shift of first two moments change higher moments too

Skewness and kurtosis are not actually moments of the random variable itself (either raw or central). They're central moments divided by the corresponding power of $\sigma$, to make them unit-free.

e.g. The third raw moment $\mu_3' = E(X^3)$ is affected by both changes of location and scale, the third central moment is $\mu_3=E[(X-\mu)^3]$ is not affected by changes of location (since that's removed when $\mu$ is subtracted) but is affected by change of scale. The skewness, by contrast, is $\mu_3/\sigma^3$; it's effectively the third moment of a standardized version of the original random variable.

Let $\mu_k$ be the $k$-th central moment of $X$ and let $Y=c(X-\mu_X)$. Then $E(Y^k)=E([c(X-\mu_X)]^k) = E(c^k[(X-\mu_X)]^k) = c^kE([(X-\mu_X)]^k)=c^k\mu_k$

So while $\mu_3$ is affected by change of scale, $\mu_3/\sigma^3$ (third-moment skewness) is not.

Similar comments apply to raw fourth moments, central fourth moments and kurtosis and the same ideas extend to higher order moments - raw moments are affected by both kinds of change, central moments by scale changes only, and standardized central moments by neither.