You are in essence looking for a multivariate measure of skew and kurtosis. There are many. I would start with the most establish ones, which are the multivariate skew and kurtosis measures of Mardia 1977 [0].
It seems to me you are more asking for an implementation than about the measures themselves. I don t know of any matlab implementation, but the R code below (from the R library psych) should be fairly easy to translate in matlab:
mardia <- function(x, na.rm=TRUE, plot=TRUE) {
cl <- match.call()
x <- as.matrix(x) # in case it was a dataframe
if(na.rm) x <- na.omit(x)
n <- dim(x)[1]
p <- dim(x)[2]
x <- scale(x,scale=FALSE) # zero center
S <- cov(x)
S.inv <- solve(S)
D <- x %*% S.inv %*% t(x)
b1p <- sum(D^3)/n^2
b2p <- tr(D^2)/n
chi.df <- p*(p+1)*(p+2)/6
k <- (p+1)*(n+1)*(n+3)/(n*((n+1)*(p+1) -6))
small.skew <- n*k*b1p/6
M.skew <- n*b1p/6
M.kurt <- (b2p - p * (p+2))*sqrt(n/(8*p*(p+2)))
p.skew <- 1-pchisq(M.skew,chi.df)
p.small <- 1 - pchisq(small.skew,chi.df)
p.kurt <- 2*(1- pnorm(abs(M.kurt)))
d = sqrt(diag(D))
if(plot) {qqnorm(d)
qqline(d)}
results <- list(n.obs=n, n.var=p, b1p=b1p,b2p= b2p, skew=M.skew,
small.skew=small.skew, p.skew=p.skew, p.small=p.small,
kurtosis=M.kurt, p.kurt=p.kurt, d=d, Call=cl)
class(results) <- c("psych", "mardia")
return(results)
}
- [0] K.V. Mardia (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3):pp. 519-30, 1970.
Since the question was updated, I update my answer:
The first part (To compute the skewness, why not standardize both the mean and the variance?) is easy: That is precisely how it's done! See the definitions of skewness and kurtosis in wiki.
The second part is both easy and hard. On one hand we could say that it is impossible to normalize random variable to satisfy three moment conditions, as linear transformation $X \to aX + b$ allows only for two. But on the other hand, why should we limit ourselves to linear transformations? Sure, shift and scale are by far the most prominent (maybe because they are sufficient most of the time, say for limit theorems), but what about higher order polynomials or taking logs, or convolving with itself? In fact, isn't it what Box-Cox transform is all about -- removing skew?
But in the case of more complicated transformations, I think, the context and the transformation itself becomes important, so maybe that is why there are no more "moments with names". That does not mean that r.v.s are not transformed and that the moments are not calculated, on the contrary. You just chose your transformation, calculate what you need and move on.
The old answer about why centralized moments represent shape better than raw:
The keyword is shape. As whuber suggested, by shape we want consider the properties of the distribution that are invariant to translation and scaling. That is, when you consider variable $X + c$ instead of $X$, you get the same distribution function (just shifted to the right or left), so we would like to say that its shape stayed the same.
The raw moments do change when you translate the variable, so they reflect not only the shape, but also a location. In fact, you can take any random variable, and shift it $X \to X + c$ appropriately to get any value for its, say, raw third moment.
The same observation holds for all odd moments and to lesser extent for even moments (they are bounded from below and lower bound does depend on shape).
The centralized moment, on the other hand, does not change when you translate the variable, so that's why they are more descriptive of the shape. For example, if your even centralized moment is large, you known that random variable has some mass not too close to mean. Or if your odd moment is zero, you known that your random variable has some symmetry around mean.
The same argument extends to scale, which is transformation $X\to cX$. The usual normalization in this case is division by standard deviation, and the corresponding moments are called normalized moments, at least by wikipedia.
Best Answer
The moments are defined in terms of integrals.
For continuous random variables
$E(X)=\int_{-\infty}^\infty x f(x) dx$
More generally:
$E(X^k)=\int_{-\infty}^\infty x^k f(x) dx$
$E[(X-\mu)^k]=\int_{-\infty}^\infty (x-\mu)^k f(x) dx$
See Wikipedia on Moments (mathematics).
If we can evaluate the relevant integral, yes.
The skewness of a random variable is not the third moment of that variable.
Wikipedia on skewness
Variance is the second central moment, so it follows from the formula I gave above by putting $k=2$.
Yes, using basic properties of expectation, you can write $E[(X-\mu)^2]=E[X^2]-\mu^2$.
See Wikipedia on variance.
Strictly the integral is over the real line, but the pdf is only non-zero within its support, so effectively, yes.
\begin{eqnarray} E[(X-\mu)^3]&=&E[(X^3-3\mu X^2+3\mu^2 X - \mu^3)]\\ &=&E(X^3)-3\mu E(X^2)+3\mu^2 E(X) - E(\mu^3)\\ &=&E(X^3)-3\mu E(X^2)+2\mu^3 \end{eqnarray}
The general case is given by Wikipedia in the article on central moments