Solved – Why is isometric log-ratio transformation preferred over the additive(alr) or centered(clr) with compositional data

compositional-dataregression

I'm doing linear regression on compositional data using log-ratio transformation with census data. The IVs are compositional (percents summing to 100). The DV is non-compositional and continuous.

The alr and clr results are more easily interpreted. They all produce the same measure of fit. I'm inclined to go with alr (or clr). Aitchison characterizes ilr as the "pure mathematics" approach, but my audience is not statisticians or mathematicians.

If my objective is only to communicate insight from the analysis, why should I go with the much more difficult to interpret ilr (with balances) approach?

I've read heaps of research by Aitchison, Juan Jose Egozcue and Vera Pawlosky-Glahn but not looking to debate.

Best Answer

Continuing off of marianess's answer, clr is really not suitable due to the colinearity issue. In words if you try to make inferences with clr transformed data, you may fall in the trap of trying to infer increase/decreases of variables, which you can never never do with proportions in the first place.

The ilr transformation attempts to resolve this by just sticking to ratios of partitions, since ratios are stable quantities. These partitions can be represented as trees, where internal nodes in the tree represents the log ratio of the geometric means of the subtrees. This log ratios of subtrees is known as balances.

I'd also recommend checking out these publications, since they all have nice explanations of how to interpret the ilr transform.

http://msystems.asm.org/content/2/1/e00162-16

https://peerj.com/articles/2969/

https://elifesciences.org/content/6/e21887

Here is an IPython notebook that goes in the details of how to calculate balances given a tree

I also gave a description how to this with the modules in scikit-bio here in case you curious.

Related Solutions

Solved – the reason the log transformation is used with right-skewed distributions

Economists (like me) love the log transformation. We especially love it in regression models, like this: \begin{align} \ln{Y_i} &= \beta_1 + \beta_2 \ln{X_i} + \epsilon_i \end{align}

Why do we love it so much? Here is the list of reasons I give students when I lecture on it:

It respects the positivity of $Y$. Many times in real-world applications in economics and elsewhere, $Y$ is, by nature, a positive number. It might be a price, a tax rate, a quantity produced, a cost of production, spending on some category of goods, etc. The predicted values from an untransformed linear regression may be negative. The predicted values from a log-transformed regression can never be negative. They are $\widehat{Y}_j=\exp{\left(\beta_1 + \beta_2 \ln{X_j}\right)} \cdot \frac{1}{N} \sum \exp{\left(e_i\right)}$ (See an earlier answer of mine for derivation).
The log-log functional form is surprisingly flexible. Notice: \begin{align} \ln{Y_i} &= \beta_1 + \beta_2 \ln{X_i} + \epsilon_i \\ Y_i &= \exp{\left(\beta_1 + \beta_2 \ln{X_i}\right)}\cdot\exp{\left(\epsilon_i\right)}\\ Y_i &= \left(X_i\right)^{\beta_2}\exp{\left(\beta_1\right)}\cdot\exp{\left(\epsilon_i\right)}\\ \end{align} Which gives us: That's a lot of different shapes. A line (whose slope would be determined by $\exp{\left(\beta_1\right)}$, so which can have any positive slope), a hyperbola, a parabola, and a "square-root-like" shape. I've drawn it with $\beta_1=0$ and $\epsilon=0$, but in a real application neither of these would be true, so that the slope and the height of the curves at $X=1$ would be controlled by those rather than set at 1.
As TrynnaDoStat mentions, the log-log form "draws in" big values which often makes the data easier to look at and sometimes normalizes the variance across observations.
The coefficient $\beta_2$ is interpreted as an elasticity. It is the percentage increase in $Y$ from a one percent increase in $X$.
If $X$ is a dummy variable, you include it without logging it. In this case, $\beta_2$ is the percent difference in $Y$ between the $X=1$ category and the $X=0$ category.
If $X$ is time, again you include it without logging it, typically. In this case, $\beta_2$ is the growth rate in $Y$---measured in whatever time units $X$ is measured in. If $X$ is years, then the coefficient is annual growth rate in $Y$, for example.
The slope coefficient, $\beta_2$, becomes scale-invariant. This means, on the one hand, that it has no units, and, on the other hand, that if you re-scale (i.e. change the units of) $X$ or $Y$, it will have absolutely no effect on the estimated value of $\beta_2$. Well, at least with OLS and other related estimators.
If your data are log-normally distributed, then the log transformation makes them normally distributed. Normally distributed data have lots going for them.

Statisticians generally find economists over-enthusiastic about this particular transformation of the data. This, I think, is because they judge my point 8 and the second half of my point 3 to be very important. Thus, in cases where the data are not log-normally distributed or where logging the data does not result in the transformed data having equal variance across observations, a statistician will tend not to like the transformation very much. The economist is likely to plunge ahead anyway since what we really like about the transformation are points 1,2,and 4-7.

Solved – How to perform isometric log-ratio transformation

The ILR (Isometric Log-Ratio) transformation is used in the analysis of compositional data. Any given observation is a set of positive values summing to unity, such as the proportions of chemicals in a mixture or proportions of total time spent in various activities. The sum-to-unity invariant implies that although there may be $k\ge 2$ components to each observation, there are only $k-1$ functionally independent values. (Geometrically, the observations lie on a $k-1$-dimensional simplex in $k$-dimensional Euclidean space $\mathbb{R}^k$. This simplicial nature is manifest in the triangular shapes of the scatterplots of simulated data shown below.)

Typically, the distributions of the components become "nicer" when log transformed. This transformation can be scaled by dividing all values in an observation by their geometric mean before taking the logs. (Equivalently, the logs of the data in any observation are centered by subtracting their mean.) This is known as the "Centered Log-Ratio" transformation, or CLR. The resulting values still lie within a hyperplane in $\mathbb{R}^k$, because the scaling causes the sum of the logs to be zero. The ILR consists of choosing any orthonormal basis for this hyperplane: the $k-1$ coordinates of each transformed observation become its new data. Equivalently, the hyperplane is rotated (or reflected) to coincide with the plane with vanishing $k^\text{th}$ coordinate and one uses the first $k-1$ coordinates. (Because rotations and reflections preserve distance they are isometries, whence the name of this procedure.)

Tsagris, Preston, and Wood state that "a standard choice of [the rotation matrix] $H$ is the Helmert sub-matrix obtained by removing the first row from the Helmert matrix."

The Helmert matrix of order $k$ is constructed in a simple manner (see Harville p. 86 for instance). Its first row is all $1$s. The next row is one of the the simplest that can be made orthogonal to the first row, namely $(1, -1, 0, \ldots, 0)$. Row $j$ is among the simplest that is orthogonal to all preceding rows: its first $j-1$ entries are $1$s, which guarantees it is orthogonal to rows $2, 3, \ldots, j-1$, and its $j^\text{th}$ entry is set to $1-j$ to make it orthogonal to the first row (that is, its entries must sum to zero). All rows are then rescaled to unit length.

Here, to illustrate the pattern, is the $4\times 4$ Helmert matrix before its rows have been rescaled:

$$\pmatrix{1&1&1&1 \\ 1&-1&0&0 \\ 1&1&-2&0 \\ 1&1&1&-3}.$$

(Edit added August 2017) One particularly nice aspect of these "contrasts" (which are read row by row) is their interpretability. The first row is dropped, leaving $k-1$ remaining rows to represent the data. The second row is proportional to the difference between the second variable and the first. The third row is proportional to the difference between the third variable and the first two. Generally, row $j$ ($2\le j \le k$) reflects the difference between variable $j$ and all those that precede it, variables $1, 2, \ldots, j-1$. This leaves the first variable $j=1$ as a "base" for all contrasts. I have found these interpretations helpful when following the ILR by Principal Components Analysis (PCA): it enables the loadings to be interpreted, at least roughly, in terms of comparisons among the original variables. I have inserted a line into the R implementation of ilr below that gives the output variables suitable names to help with this interpretation. (End of edit.)

Since R provides a function contr.helmert to create such matrices (albeit without the scaling, and with rows and columns negated and transposed), you don't even have to write the (simple) code to do it. Using this, I implemented the ILR (see below). To exercise and test it, I generated $1000$ independent draws from a Dirichlet distribution (with parameters $1,2,3,4$) and plotted their scatterplot matrix. Here, $k=4$.

The points all clump near the lower left corners and fill triangular patches of their plotting areas, as is characteristic of compositional data.

Their ILR has just three variables, again plotted as a scatterplot matrix:

This does indeed look nicer: the scatterplots have acquired more characteristic "elliptical cloud" shapes, better amenable to second-order analyses such as linear regression and PCA.

Tsagris et al. generalize the CLR by using a Box-Cox transformation, which generalizes the logarithm. (The log is a Box-Cox transformation with parameter $0$.) It is useful because, as the authors (correctly IMHO) argue, in many applications the data ought to determine their transformation. For these Dirichlet data a parameter of $1/2$ (which is halfway between no transformation and a log transformation) works beautifully:

"Beautiful" refers to the simple description this picture permits: instead of having to specify the location, shape, size, and orientation of each point cloud, we need only observe that (to an excellent approximation) all the clouds are circular with similar radii. In effect, the CLR has simplified an initial description requiring at least 16 numbers into one that requires only 12 numbers and the ILR has reduced that to just four numbers (three univariate locations and one radius), at a price of specifying the ILR parameter of $1/2$--a fifth number. When such dramatic simplifications happen with real data, we usually figure we're on to something: we have made a discovery or achieved an insight.

This generalization is implemented in the ilr function below. The command to produce these "Z" variables was simply

z <- ilr(x, 1/2)

One advantage of the Box-Cox transformation is its applicability to observations that include true zeros: it is still defined provided the parameter is positive.

References

Michail T. Tsagris, Simon Preston and Andrew T.A. Wood, A data-based power transformation for compositional data. arXiv:1106.1451v2 [stat.ME] 16 Jun 2011.

David A. Harville, Matrix Algebra From a Statistician's Perspective. Springer Science & Business Media, Jun 27, 2008.

Here is the R code.

#
# ILR (Isometric log-ratio) transformation.
# `x` is an `n` by `k` matrix of positive observations with k >= 2.
#
ilr <- function(x, p=0) {
  y <- log(x)
  if (p != 0) y <- (exp(p * y) - 1) / p       # Box-Cox transformation
  y <- y - rowMeans(y, na.rm=TRUE)            # Recentered values
  k <- dim(y)[2]
  H <- contr.helmert(k)                       # Dimensions k by k-1
  H <- t(H) / sqrt((2:k)*(2:k-1))             # Dimensions k-1 by k
  if(!is.null(colnames(x)))                   # (Helps with interpreting output)
    colnames(z) <- paste0(colnames(x)[-1], ".ILR")
  return(y %*% t(H))                          # Rotated/reflected values
}
#
# Specify a Dirichlet(alpha) distribution for testing.
#
alpha <- c(1,2,3,4)
#
# Simulate and plot compositional data.
#
n <- 1000
k <- length(alpha)
x <- matrix(rgamma(n*k, alpha), nrow=n, byrow=TRUE)
x <- x / rowSums(x)
colnames(x) <- paste0("X.", 1:k)
pairs(x, pch=19, col="#00000040", cex=0.6)
#
# Obtain the ILR.
#
y <- ilr(x)
colnames(y) <- paste0("Y.", 1:(k-1))
#
# Plot the ILR.
#
pairs(y, pch=19, col="#00000040", cex=0.6)

Best Answer

Related Solutions

Solved – the reason the log transformation is used with right-skewed distributions

Solved – How to perform isometric log-ratio transformation

References

Related Question