Generalized Additive Models – Analyzing Non-linear Data with GAM Regression: Exponential or Logarithmic Curves?

ecologygeneralized-additive-modelnonlinear regression

I am trying to publish my Masters thesis which looked at elephant impacts on vegetation, with a focus on the effect of artificial waterholes. I have made my initial submission and am now performing major revision, which involves some questioning of my statistical approaches.

Some of the elephant impact measures I looked at showed a non-linear relationship with distance to water which led me to use GAMs. However, a reviewer has said that they feel this is statistical overkill and achieves little more than "fitting complicated nonlinear functions that demonstrate that distance matters". They suggest I should rather describe non-linear relationships between elephant impacts and distance to water by fitting simple functions which others can use (e.g. exponential decay curves). I think the point being made is that these simpler functions would allow me to say something more generalisable about the elephant impacts (e.g. they resemble a logarithmic curve), while the GAMs are only useful for describing the impacts I found in my particular study area and is thus of less general usefulness.

I'm still a novice at statistics so I would appreciate some thoughts on which approach seems more sensible. I have included some examples from my data below.

The top graph shows canopy volume/ha in one vegetation type at different distances from water(the circles represent individual sampling plots). The line shows the predicted relationship from the GAM regression. This sort of relationship appears to resemble a logarithmic curve so the reviewer would perhaps recommend I fit that instead in a case like this.

Canopy volume/ha in one vegetation type at different distances from water(the circles represent individual sampling plots

Canopy volume/ha in the second vegetation type at different distances from water(the circles represent individual sampling plots

The second graph shows canopy volume/ha in relation to distance to water, but in a second vegetation type. This vegetation type was more heavily impacted by elephants. Again the line on the graph is derived from the GAM prediction, but this could be one where I could look to fit an exponential curve?

I'd appreciate any thoughts on the merits of my use of GAMs versus the reviewer's recommendation to use simpler functions. As an aside, I also have not really explored fitting logarithmic or exponential curves to data in R, so I would also much appreciate any pointers in that regard.

Best Answer

In addition to Demetri's answer (+1):

  1. The use of GAM is well-established in the field of Ecology so I would add certain books/influential articles. Show you are not reinventing the wheel rather that you are abreast with modern modelling approaches.
  2. You do not describe your sample size but you might want to try a validation schema to show that through the use of GAMs you get better goodness-of-fit. While hand-wavy if something like an AIC/BIC shows a clear preference for a particular model this can pacify some (not too sophisticated) criticism...
  3. I would emphasise how the GAM fitting procedure looks into shrinkage. It is plausible that someone oversimplified GAMs in his/her head as "a polynomial basis of sorts" and therefore prone to overfit.
  4. Take their view-point for a moment: are there any established studies suggesting logarithmic, or exponential decay curves already? The reviewer might be satisfied that you acknowledge them as a possibility. Maybe you can make a critical assessment of that prior work and show how your work is a step forward.
  5. As Dimitri mentioned, specifying a functional form without prior knowledge can induce strong bias. You can politely double-down on the fact you are using a non-parametric approach. Maybe even try a different basis functions (e.g. cubic regression splines and thin-plate splines) and show how the results are (hopefully) very similar and thus not dependant on the choice of basis functions.

Just to be clear: In my opinion, using GAMs is the correct approach here; the criticism of "why not X-functional form" is weak. Such criticism might be warranted if prior research suggested robust evidence for a particular modelling assumption but even then it would not be a particularly strong position to take. That said, try to see where they are come from too, criticism can be helpful strength your manuscript and/or alleviate worries of future readers too.

Related Question