Solved – Sufficient statistical multivariate Gaussian

descriptive statisticsmatrix

I have heard of this term

supossing I have a matrix like

$\begin{bmatrix}
1 & 12 & 13 & 14 \\
2 & 22 & 23 & 24 \\
3 & 32 & 33 & 34 \\
4 & 42 & 43 & 44\\
5 & 52 & 53& 54
\end{bmatrix}$

I count the number of rows $n$ wich is $5$

I compute the linear and quadratic sum like

linear: $\sum_{i=1}^{n} x_{i}$

\begin{align}
(&1+2+3+4+5,
12+22+32+42+52, \\
&13+23+33+43+53,14+24+34+44+54)\\
&=\left(15,160,165,170 \right )
\end{align}

also the quadratic sum must be computed each as the quadratic sum of each column
(I ommited because is a little long)
Quadratic = $\sum_{i=1}^{n} x_{i}x_{i}^{T}$

Could you explain why are they call sufficient statistical
could you point to some tutorial with more details?

I have heard that this vector (linear sum) and the quadratic matrix (the quadratic sum of points) summarize the original data $x$

Best Answer

For a univariate normal distribution the only parameters are the mean and the variance (or standard deviation), so those 2 pieces of information completely determine the distribution. The sum of the x's divided by the number of points gives the mean, so the sum of the x's contain enough information to get the mean. The sum of the $x^2$ values similarly can be used to find the variance (along with the mean or sum of x's and the sample size), so the sum of the x's and the sum of the $x^2$'s are sufficient statistics.

Now for a multivariate normal the parameters are a vector of means and a matrix of variances and covariances. Similarly the vector of column sums divided by the sample size gives the mean vector and a combination of the quadradic (or cross product) with the column sums and sample size will give the variance/covariance matrix. So if you belive that your data comes from a multivariate normal, then those sufficient statistics give all the information that you need, one approach to dealing with large data sets is to just update the sufficient stats without actually keeping more than a few rows of the data in memory at a times.

Related Question