Solved – Multivariate response regressions vs many linear models

intuitionmultivariate regression

Would anyone be willing to venture an intuitive description of the situations under which a multivariate response model is more appropriate than many linear regressions?

As an example, take a randomly allocated agricultural extension program, and yields of several different crops grown by farmers. You could run several different models for each crop. Or you could aggregate the crops somehow. Or maybe you could run a multivariate response model, whereby your dependent variable is actually a matrix rather than a vector.

I've been reading up on the math of it all, but I haven't found a good intuitive description of the situations where these sorts of models are the most useful, nor their practical pitfalls. I get that the errors will be correlated between responses. Does this mean that you'd get more power in a situation where individual regressions would be underpowered? Is there any reason why coefficient matrices estimated in these models wouldn't have a causal interpretation if a variable is randomly allocated?

Best Answer

I think my comments have grown long enough for an answer...

One reason why you might want to look at the multivariate case rather than univariate cases is when there's a lot of dependence between variables. It's quite possible for each univariate response to show "no effect" but the multivariate one to show a strong one. See this plot about a difference between two groups on just two dimensions

Note that here, $y$ and $x$ are both DVs, and the grouping variable (red/black indicator) is the (lone) IV in the 'regression'.

two groups, two dependent DVs

The issue is that the thing whose mean really differs between the two groups is not the variable $X$ or the variable $Y$ (that is, $\mu_{X2}-\mu_{X1}$ is almost zero, same for $Y$), but a particular linear combination - in the example, $Y-X$ - on which the means of the two groups strongly differ.

In that case univariate $t$ tests find nothing but a multivariate test sees it easily (which can be done by regression and multivariate regression where there is a single IV, the group indicator).

The same issue applies to other, less simple regressions.

Related Question