Compositional Data Analysis – Comparing Paired Before and After Composition Proportions

compositional-dataintervention-analysispaired-dataproportion;statistical significance

I have some data on the composition of purchases by households, before and after an intervention. This composition is known at the number of items or amount spent level within 7 categories, and can be converted into a proportion of purchases or expenditure in each category.

I would like to be able to say if the mix of purchases has significantly changed as a result of the intervention. There are a number of non-independent things going on. Firstly, households can be paired before and after and also the composition proportions sum to 1.0, so that the proportion spend in say the fruit category isn't independent of the proportion spent in the sweets category. Is it best to do the analysis at the items/spend level or the proportions? And what technique is most appropriate?

Best Answer

Is it best to do the analysis at the items/spend level or the proportions?

Either is possible. You need to decide on which type of hypothesis you want to examine. My sense is that the "items/spend level" is probably more informative, to see whether overall spending was affected by the intervention. You then deal with the within-household correlations by using multivariate (multiple-outcome) analysis, which is handled directly even by the R lm() function. This document introduces how to extend linear modeling to multivariate outcomes.

It's not clear from the question whether all of your households underwent the same intervention or if there was a control group that didn't. If they all underwent the intervention then what you have is a simple to calculate (but difficult to interpret, without a control group) equivalent to a set of paired t-tests. If there is a control group or if you want to account for further covariates, then heed the cautions about analysis of changes in the page linked by @kjetil b halvorsen.

There is extensive literature on analysis of compositional data if your primary interest is in composition per se. The compositions package in R provides a set of transformations appropriate for compositional data, with a guide for choosing among them in this vignette.

Then you can just continue with multivariate analysis. As this document about compositions explains:

Linear models can use any of the given scales as regressors or as response. However we decided not to introduce special routines for that since one retains much more flexibility by using standard methods in conjunction with transformations.

The document then continues to show examples of how to proceed with compositional data as predictors or as outcomes.

Related Question