Solved – What are the pros and cons to fit data with simple polynomial regression vs. complicated ODE model

differential equationsfittingmachine learningpolynomialregression

Suppose in a disease outbreak scenario and we want to estimate number of infected people based infections over time.

Why we cannot simply fit the data with some polynomials (or some MLP neural network)?

what are the advantages of using some complicated model such as SIR model from ODE?

(Attached code and plot is an example of fitting a high order polynomial (red line) with SIR model generated data (black dots), we can see we are getting an almost perfect fit.)

library(deSolve)

# generate data from SIR Model
N <- 1000
init <- c(S = 999, I = 1, R = 0)

SIR <- function(time, state, parameters) {
  par <- as.list(c(state, parameters))
  with(par, { dS <- -beta * (S/N) * I
  dI <- beta * (S/N) * I - gamma * I
  dR <- gamma * I
  list(c(dS, dI, dR))
  })
}
out <- ode(init, seq(1000), func = SIR, parms = c(beta=0.1, gamma=0.01))

# fit with high order polynomial
d = as.data.frame(out[50:300,])
names(d) = c('time', 'susceptible', 'infected', 'recovered')
poly_fit  = lm(infected~poly(time,15),d)
plot(d$time, d$infected)
lines(d$time, predict(poly_fit, d), col ='red', lwd = 3)
grid()

Best Answer

Just extend time a little bit, we can see how terrible is the polynomial fit:

plot(seq(30,320), predict(poly_fit, data.frame(time = seq(30,320))), type='l', 
col='red')
points(d$time, d$infected)
grid()

From machine learning perspective, we say the polynomial fit is overfitting.

For SIR model, differential equations are describing the underline physical laws and interactions between variables.
But the curve fitting approach is just try to minimize the loss with many parameters that do not have physical meaning. As a result, we will get loss minimized / perfect fit for training data. But the system is not describing any physics.

For pros and cons, SIR fitting vs. polynomial fitting is very similar to the discussion on "parametric model vs. non-parametric model".

For example, if we are fitting data with normal distribution or using kernel density estimation.

If the data is really come from normal distribution or mostly satisfy model assumptions, then fitting the data to normal distribution is better than non-parametric estimation.
On the other hand, if data is far way from model assumptions, say contains a lot of outliers, then fitting data with non-parametric methods will have better results.

Related Solutions

Epidemiology – Maximum Likelihood Estimate of Infection Model Parameters: Methods and Applications

Here's a possibility in which the model is modified (1) to be explicitly probabilistic, and (2) to take place in discrete time.

The code below explains the modified model, simulates it, and then uses MLE to recover the parameters (whose true value is known in this toy example, since we simulated the data). Careful: my beta will not be exactly equivalent to your beta -- see "story" in the comments below.

library(ggplot2)
library(reshape2)

## S(t) susceptible, I(t) infected, R(t) recovered at time t
## Probabilistic model in discrete time:
## S(t+1) = S(t) - DeltaS(t)
## I(t+1) = I(t) + DeltaS(t) - DeltaR(t)
## R(t+1) = R(t) + DeltaR(t)
## DeltaR(t) ~ Binomial(I(t), gamma) >= 0
## DeltaS(t) ~ Binomial(S(t), 1 - (1 - beta)^I(t)) >= 0
## Story: each infected has probability gamma of recovering during the period;
## before recoveries are realized, each susceptible interacts with each infected;
## each interaction leads to infection with probability beta;
## susceptible becomes infected if >= 1 of her interactions leads to infection

simulate <- function(T=100, S1=100, I1=10, R1=0, beta=0.005, gamma=0.10) {
    stopifnot(T > 0)
    stopifnot(beta >= 0 && beta <= 1)
    stopifnot(gamma >= 0 && gamma <= 1)
    total_pop <- S1 + I1 + R1
    df <- data.frame(t=seq_len(T))
    df[, c("S", "I", "R")] <- NA
    for(t in seq_len(T)) {
        if(t == 1) {
            df$S[t] <- S1
                df$I[t] <- I1
            df$R[t] <- R1
                next
            }
            DeltaS <- rbinom(n=1, size=df$S[t-1], prob=1 - (1-beta)^df$I[t-1])
            DeltaR <- rbinom(n=1, size=df$I[t-1], prob=gamma)
        df$S[t] <- df$S[t-1] - DeltaS
        df$I[t] <- df$I[t-1] + DeltaS - DeltaR
        df$R[t] <- df$R[t-1] + DeltaR
        stopifnot(df$S[t] + df$I[t] + df$R[t] == total_pop)  # Sanity check
    }
    return(df)
}

inverse_logit <- function(x) {
    p <- exp(x) / (1 + exp(x))  # Maps R to [0, 1]
    return(p)
}
curve(inverse_logit, -10, 10)  # Sanity check

loglik <- function(logit_beta_gamma, df) {
    stopifnot(length(logit_beta_gamma) == 2)
    beta <- inverse_logit(logit_beta_gamma[1])
    gamma <- inverse_logit(logit_beta_gamma[2])
    dS <- -diff(df$S)
        dR <- diff(df$R)
    n <- nrow(df)
    pr_dS <- 1 - (1-beta)^df$I[seq_len(n-1)]  # Careful, problematic if 1 or 0
        return(sum(dbinom(dS, size=df$S[seq_len(n-1)], prob=pr_dS, log=TRUE) +
               dbinom(dR, size=df$I[seq_len(n-1)], prob=gamma, log=TRUE)))
}

get_estimates <- function() {
    df <- simulate()
    mle <- optim(par=c(-4, 0), fn=loglik, control=list(fnscale=-1), df=df)
    beta_gamma_hat <- inverse_logit(mle$par)
    names(beta_gamma_hat) <- c("beta", "gamma")
    return(beta_gamma_hat)
}

set.seed(54321999)

df <- simulate()
df_melted <- melt(df, id.vars="t")
p <- (ggplot(df_melted, aes(x=t, y=value, color=variable)) +
      geom_line(size=1.1) + theme_bw() +
      xlab("time") +
      theme(legend.key=element_blank()) +
      theme(panel.border=element_blank()))
p

## Sampling distribution of beta_gamma_hat
estimates <- replicate(100, get_estimates())
df_estimates <- as.data.frame(t(estimates))
summary(df_estimates)  # Looks reasonable given true values of (0.005, 0.10)

Let me know if anything is not self-explanatory.

Disclaimer: I haven't studied the SIR model except once very briefly in a college class, several years ago. The model I simulate and estimate above is not exactly the classic differential equation SIR model you stated in your question. Also I'm feeling a bit feverish today, so check the code for mistakes!

Solved – What are the pros and cons of employing LASSO for causal analysis

I don't know all of them, I'm sure, so I hope no one will mind if we do this wiki-style.

One important one though is that the LASSO is biased (source, Wasserman in lecture, sorry), which while acceptable in prediction, is a problem in causal inference. If you want causality, you probably want it for Science, so you're not just trying to estimate the most useful parameters (which happen strangely to predict well), you're trying to estimate the TRUE(!) parameters.

Best Answer

Related Solutions

Epidemiology – Maximum Likelihood Estimate of Infection Model Parameters: Methods and Applications

Solved – What are the pros and cons of employing LASSO for causal analysis

Related Question