Multivariate Regression – How to Understand the Need for Multivariate Regression over Univariate Regressions

inferencemultiple regressionmultivariate regressionregression

I just browsed through this wonderful book: Applied multivariate statistical analysis by Johnson and Wichern. The irony is, I am still not able to understand the motivation for using multivariate (regression) models instead of separate univariate (regression) models. I went through stats.statexchange posts 1 and 2 that explain (a) difference between multiple and multivariate regression and (b) interpretation of multivariate regression results, but I am not able to tweak out the use of multivariate statistical models from all the information I get online about them.

My questions are:

  1. Why do we need multivariate regression? What is the advantage of considering outcomes simultaneously rather than individually, in order to draw inferences.
  2. When to use multivariate models and when to use multiple univariate models (for multiple outcomes).
  3. Take an example given in the UCLA site with three outcomes: locus of control, self-concept, and motivation. With respect to 1. and 2., can we compare the analysis when we do three univariate multiple regression versus one multivariate multiple regression? How to justify one over another?
  4. I haven't come across many scholarly papers that utilize multivariate statistical models. Is this because of the multivariate normality assumption, the complexity of model fitting/interpretation or any other specific reason?

Best Answer

Be sure to read the full example on the UCLA site that you linked.

Regarding 1:
Using a multivariate model helps you (formally, inferentially) compare coefficients across outcomes.
In that linked example, they use the multivariate model to test whether the write coefficient is significantly different for the locus_of_control outcome vs for the self_concept outcome. I'm no psychologist, but presumably it's interesting to ask whether your writing ability affects/predicts two different psych variables in the same way. (Or, if we don't believe the null, it's still interesting to ask whether you have collected enough data to demonstrate convincingly that the effects really do differ.)
If you ran separate univariate analyses, it would be harder to compare the write coefficient across the two models. Both estimates would come from the same dataset, so they would be correlated. The multivariate model accounts for this correlation.

Also, regarding 4:
There are some very commonly-used multivariate models, such as Repeated Measures ANOVA . With an appropriate study design, imagine that you give each of several drugs to every patient, and measure each patient's health after every drug. Or imagine you measure the same outcome over time, as with longitudinal data, say children's heights over time. Then you have multiple outcomes for each unit (even when they're just repeats of "the same" type of measurement). You'll probably want to do at least some simple contrasts: comparing the effects of drug A vs drug B, or the average effects of drugs A and B vs placebo. For this, Repeated Measures ANOVA is an appropriate multivariate statistical model/analysis.