Solved – Comparing model fits across a set of nonlinear regression models

aicnonlinear regression

CONTEXT:
I am modelling the relation between time (1 to 30) and a DV for a set of 60 participants. Each participant has their own time series.
For each participant I am examining the fit of 5 different theoretically plausible functions within a nonlinear regression framework.
One function has one parameter; three functions have three parameters; and one function has five parameters.

I want to use a decision rule to determine which function provides the most "theoretically meaningful" fit.
However, I don't want to reward over-fitting.

Over-fitting seems to come in two varieties. One form is the standard sense whereby an additional parameter enables slightly more of the random variance to be explained. A second sense is where there is an outlier or some other slight systematic effect, which is of minimal theoretical interest. Functions with more parameters sometimes seem capable of capturing these anomalies and get rewarded.

I initially used AIC. And I have also experimented with increasing the penalty for parameters.
In addition to using $2k$: [$\mathit{AIC}=2k + n[\ln(2\pi \mathit{RSS}/n) + 1]$];
I've also tried $6k$ (what I call AICPenalised).
I have inspected scatter plots with fit lines imposed and corresponding recommendations based on AIC and AICPenalised. Both AIC and AICPenalised provide reasonable recommendations. About 80% of the time they agree. However, where they disagree, AICPenalised seems to make recommendations that are more theoretically meaningful.

QUESTION:
Given a set of nonlinear regression function fits:

  • What is a good criterion for deciding on a best fitting function in nonlinear regression?
  • What is a principled way of adjusting the penalty for number of parameters?

Best Answer

For each participant, compute the cross-validated (leave one out) prediction error per functional form and assign the participant the form with the smallest one. That should do something to keep the overfitting under control.

That approach ignores higher level problem structure: the population has groups that are assumed to share a functional form, so data from one participant with the a particular form is potentially useful for estimating the parameters of another with the same form. But it's a start, if not a finish, for the analysis.