Solved – Why is isometric log-ratio transformation preferred over the additive(alr) or centered(clr) with compositional data

compositional-dataregression

I'm doing linear regression on compositional data using log-ratio transformation with census data. The IVs are compositional (percents summing to 100). The DV is non-compositional and continuous.

The alr and clr results are more easily interpreted. They all produce the same measure of fit. I'm inclined to go with alr (or clr). Aitchison characterizes ilr as the "pure mathematics" approach, but my audience is not statisticians or mathematicians.

If my objective is only to communicate insight from the analysis, why should I go with the much more difficult to interpret ilr (with balances) approach?

I've read heaps of research by Aitchison, Juan Jose Egozcue and Vera Pawlosky-Glahn but not looking to debate.

Best Answer

Continuing off of marianess's answer, clr is really not suitable due to the colinearity issue. In words if you try to make inferences with clr transformed data, you may fall in the trap of trying to infer increase/decreases of variables, which you can never never do with proportions in the first place.

The ilr transformation attempts to resolve this by just sticking to ratios of partitions, since ratios are stable quantities. These partitions can be represented as trees, where internal nodes in the tree represents the log ratio of the geometric means of the subtrees. This log ratios of subtrees is known as balances.

I'd also recommend checking out these publications, since they all have nice explanations of how to interpret the ilr transform.

http://msystems.asm.org/content/2/1/e00162-16

https://peerj.com/articles/2969/

https://elifesciences.org/content/6/e21887

Here is an IPython notebook that goes in the details of how to calculate balances given a tree

I also gave a description how to this with the modules in scikit-bio here in case you curious.

Related Question