Let's say you have linear model Y = XB + e, for Y: n*q, X: n*p, B: p*q.
Then:
def standard_error(X,Y):
beta = inv(X.T.dot(X)).dot(X.T).dot(Y)
return mean((Y-X.dot(beta))**2)
In the setting of classical multivariate linear regression, we have the model:
$$Y = X \beta + \epsilon$$
where $X$ represents the independent variables, $Y$ represents multiple response variables, and $\epsilon$ is an i.i.d. Gaussian noise term. Noise has zero mean, and can be correlated across response variables. The maximum likelihood solution for the weights is equivalent to the least squares solution (regardless of noise correlations) [1][2]:
$$\hat{\beta} = (X^T X)^{-1} X^T Y$$
This is equivalent to independently solving a separate regression problem for each response variable. This can be seen from the fact that the $i$th column of $\hat{\beta}$ (containing weights for the $i$th output variable) can be obtained by multiplying $(X^T X)^{-1} X^T$ by the $i$th column of $Y$ (containing values of the $i$th response variable).
However, multivariate linear regression differs from separately solving individual regression problems because statistical inference procedures account for correlations between the multiple response variables (e.g. see [2],[3],[4]). For example, the noise covariance matrix shows up in sampling distributions, test statistics, and interval estimates.
Another difference emerges if we allow each response variable to have its own set of covariates:
$$Y_i = X_i \beta_i + \epsilon_i$$
where $Y_i$ represents the $i$th response variable, and $X_i$ and $\epsilon_i$ represents its corresponding set of covariates and noise term. As above, the noise terms can be correlated across response variables. In this setting, there exist estimators that are more efficient than least squares, and cannot be reduced to solving separate regression problems for each response variable. For example, see [1].
References
- Zellner (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias.
- Helwig (2017). Multivariate linear regression [Slides]
- Fox and Weisberg (2011). Multivariate linear models in R. [Appendix to: An R Companion to Applied Regression]
- Maitra (2013). Multivariate Linear Regression Models. [Slides]
Best Answer
I think my comments have grown long enough for an answer...
One reason why you might want to look at the multivariate case rather than univariate cases is when there's a lot of dependence between variables. It's quite possible for each univariate response to show "no effect" but the multivariate one to show a strong one. See this plot about a difference between two groups on just two dimensions
Note that here, $y$ and $x$ are both DVs, and the grouping variable (red/black indicator) is the (lone) IV in the 'regression'.
The issue is that the thing whose mean really differs between the two groups is not the variable $X$ or the variable $Y$ (that is, $\mu_{X2}-\mu_{X1}$ is almost zero, same for $Y$), but a particular linear combination - in the example, $Y-X$ - on which the means of the two groups strongly differ.
In that case univariate $t$ tests find nothing but a multivariate test sees it easily (which can be done by regression and multivariate regression where there is a single IV, the group indicator).
The same issue applies to other, less simple regressions.