Solved – What are the pros and cons to fit data with simple polynomial regression vs. complicated ODE model

differential equationsfittingmachine learningpolynomialregression

Suppose in a disease outbreak scenario and we want to estimate number of infected people based infections over time.

Why we cannot simply fit the data with some polynomials (or some MLP neural network)?

what are the advantages of using some complicated model such as SIR model from ODE?

(Attached code and plot is an example of fitting a high order polynomial (red line) with SIR model generated data (black dots), we can see we are getting an almost perfect fit.)

enter image description here


# generate data from SIR Model
N <- 1000
init <- c(S = 999, I = 1, R = 0)

SIR <- function(time, state, parameters) {
  par <- as.list(c(state, parameters))
  with(par, { dS <- -beta * (S/N) * I
  dI <- beta * (S/N) * I - gamma * I
  dR <- gamma * I
  list(c(dS, dI, dR))
out <- ode(init, seq(1000), func = SIR, parms = c(beta=0.1, gamma=0.01))

# fit with high order polynomial
d =[50:300,])
names(d) = c('time', 'susceptible', 'infected', 'recovered')
poly_fit  = lm(infected~poly(time,15),d)
plot(d$time, d$infected)
lines(d$time, predict(poly_fit, d), col ='red', lwd = 3)

Best Answer

Just extend time a little bit, we can see how terrible is the polynomial fit:

plot(seq(30,320), predict(poly_fit, data.frame(time = seq(30,320))), type='l', 
points(d$time, d$infected)

enter image description here

From machine learning perspective, we say the polynomial fit is overfitting.

  • For SIR model, differential equations are describing the underline physical laws and interactions between variables.

  • But the curve fitting approach is just try to minimize the loss with many parameters that do not have physical meaning. As a result, we will get loss minimized / perfect fit for training data. But the system is not describing any physics.

For pros and cons, SIR fitting vs. polynomial fitting is very similar to the discussion on "parametric model vs. non-parametric model".

For example, if we are fitting data with normal distribution or using kernel density estimation.

  • If the data is really come from normal distribution or mostly satisfy model assumptions, then fitting the data to normal distribution is better than non-parametric estimation.

  • On the other hand, if data is far way from model assumptions, say contains a lot of outliers, then fitting data with non-parametric methods will have better results.

Similar question as been as asked

What's wrong to fit periodic data with polynomials?

And one of the still apply to here:

Intuitively you want to fit function that (in some sense) looks like your underlying process. This way you'll have the fewest number of parameters to estimate. Say you have a round hole, and need to fit a cork into it. If your cork is square it's harder to fit it well than if the cork were round.