I have a $N \times D$ matrix
$$\mathbf{Y} = \begin{bmatrix} — \mathbf{y}_1^\top — \\ — \mathbf{y}_2^\top — \\ \vdots\\ — \mathbf{y}_N^\top —\end{bmatrix}$$
where $\mathbf{y}_i^\top$ are row vectors. I am told in the book that I am reading that subtracting the vector
$$\bar{\mathbf{y}} := \frac{1}{N}\sum_{n=1}^{N}{\mathbf{y}_n}$$
individually from every column in $\mathbf{Y}$ will make the mean zero. Firstly, why is this true?
Now I experimented a little and realized that adding ALL the matrix elements together and dividing by $N*D$ also makes the mean of $\mathbf{Y}$ zero. Was this just a special or is it generally true that the two methods for normalizing are same? I cannot see why they would be, so could anyone please convince me?
In fact, you get two different matrices but both with a mean of 0 :/
Best Answer
Let
$$\mathbf{Y} = \begin{bmatrix} — \mathbf{y}_1^\top — \\ — \mathbf{y}_2^\top — \\ \vdots\\ — \mathbf{y}_n^\top —\end{bmatrix}$$
where $\mathbf{y}_i^\top \in \mathbb R^d$. Let the average of the rows be
$$\bar{\mathbf{y}}^\top := \frac{1}{n}\sum_{i=1}^{n}{\mathbf{y}_i}^\top = \frac 1n \mathbf{1}_n^\top \mathbf{Y}$$
Subtracting the average from every row,
$$\mathbf{Y} - \mathbf{1}_n \bar{\mathbf{y}}^\top = \cdots = \left( \mathbf{I}_n - \frac 1n \mathbf{1}_n \mathbf{1}_n^\top \right) \mathbf{Y}$$
and, thus,
$$\mathbf{1}_n^\top \left( \mathbf{Y} - \mathbf{1}_n \bar{\mathbf{y}}^\top \right) = \cdots = \mathbf{0}_n^\top$$