Solved – What are the differences between Dirichlet regression and log-ratio analysis

compositional-datamodelingreferencesregression

Compositional data can be analyzed by either Dirichlet regression or using log-ratio analysis as pioneered by John Aitchison.

My questions are

  1. What are the main differences in assumptions between these two models? When should you prefer one above the other?
  2. Are there any "methods" that one topic allows which the other doesn't? My current data set has multiple independent variables (both factors and continuous), and I would like to model both fixed and random effects, and then do parameter estimation, test hypotheses, find confidence intervals, etc.
  3. What are the best resources to learn these two topics from? The log-ratio analysis seems to be the topic of many books, but on the other hand, Dirichlet regression seems to be mainly covered in small lecture notes (20-30 pages).

Best Answer

Log ratio methods are a mathematic transform where as Dirichlet regression is a particular probabilistic model.

  1. To better understand the difference lets think about a common probabilistic model applied to log-ratio transformed data. If you apply a multivariate normal model to either Additive Log-ratio or Isometric Log-ratio transformed data is it equivalent to applying a multivariate logistic-normal model to the original compositional dataset. (e.g., the ALR or ILR transform of a logistic normal distribution is multivariate normal in the transformed space). Note that there are many different statistical models that can be applied to log-ratio transformed compositional data (Dirichlet regression is a single model).

    Now a good question becomes: What is the difference between the Dirichlet distribution and the logistic-normal distribution. The Dirichlet distribution (and Dirichlet regression by extension) assumes that the compositional parts (the variables) are independent except for the sum constraint. On the other hand, the Logistic-Normal distribution allows for covariation between the parts in addition to the sum constraint. In this sense the Logistic-Normal distribution is a more flexible distribution that is often better able to capture the covariation between variables that may be of interest to a researcher. That said, the logistic-normal distribution does not allow for complete independence between the parts as the Dirichlet distribution does (although it can get close enough for many approximations).

  2. Again, log-ratio methods are a data transform not a statistical model. There are many many different models that can do everything from mixed-effects modeling to hypothesis testing etc... In addition, both Logistic-Normal regression and Dirichlet regression can do all the things you are discussing as well. The key difference between logistic-normal regression and Dirichlet regression is whether you want to assume some level of dependence between the variables or you want complete independence between the variables (excluding the dependence that occurs due to the sum constraint of compositional data).

  3. Dirichlet regression - I would do a google search and find some papers that discuss it. here is a paper discussing the DirichletReg package for R. This appears to the the whitepaper for that package. With regards to compositional data analysis: I would recommend Modeling and Analysis of Compositional Data by Pawlowsky-Glahn, Egozcue, and Tolosana-Delgado. It is a really wonderful book. For a very applied book Analyzing Compositional Data in R by van den Boogaart and Tolosana-Delgado.