Solved – Analysing data measured as proportional composition

compositional-datamultidimensional scalingproportion;

I have a data set on the proportional composition of marine substrate for different locations which I would like to compare. For example, one replicate transect within a location may be 50% sand, 25% seagrass, 25% rubble (there are 14 categories).

In general, how does one analyse such conditional proportional data, since each value for each variable (e.g. % sand) is dependent upon the other proportions?

If these were independent proportions I would consider an ordination (e.g. NMDS) to examine clustering of replicate transects with respect to locations, but is this possible with conditional proportions? and if so how do I treat the proportions (standardisation, distance measure etc.)?

Additionally, I am considering multiple regression to test whether site characteristics (substrate composition and location) influence the density of fish (no. individuals/square metre). Is this possible? or can someone suggest a suitable alternative?

When researching this question all I could find was information about variables measured independently as proportions – no help there!

Best Answer

There is a very large literature on such data, since the 1980s at least usually called compositional.

The largest single fraction of this literature is quite possibly in mathematical geology. Other applications are numerous. One is the study of expenditure budgets in economics.

Everyone agrees that the constraint that fractions sum to 1 (percents sum to 100) gives analysis a special twist. Hence there is one fewer piece of information than may at first appear. If you tell me 13 out of 14 fractions, the last follows from the constraint. Various transformations have been suggested for such data in the light of this. However, a common problem is that many observed proportions are zero (or undetectable).

Look at your local version of Amazon for texts, searching for "compositional data". I have not read

but it is likely to be fairly strong on what you should do and to be well linked to the rest of the literature.

In a way, there is no mystery here. Much more widely known is the case of two proportions which must sum to 1, such as fraction present and fraction absent or fraction survived and fraction not survived. Here it is not usually considered necessary to spell out that only one proportion need be analysed. Compositional data are just an extension of that.

Related Question