Solved – SIR model parameter estimation in R

epidemiologypoint-estimationr

searched on google about the sir model in r and I came up with the following code.

Infected <- c(1,3,4,7,7,7,7,9,31,45,66,73,84,89,99,117,190,217,319,340,368,399,439,466,498,590,649,694,767,824,886,966,1156)

SIR <- function(time, state, parameters) {
  par <- as.list(c(state, parameters))
  with(par, {
    dS <- -beta/N * I * S
    dI <- beta/N * I * S - gamma * I
    dR <- gamma * I
    list(c(dS, dI, dR))
  })
}

library(deSolve)
init <- c(S = N-Infected[1], I = Infected[1], R = 0)
RSS <- function(parameters) {
  names(parameters) <- c("beta", "gamma")
  out <- ode(y = init, times = Day, func = SIR, parms = parameters)
  fit <- out[ , 3]
  sum((Infected - fit)^2)
}


Opt <- optim(c(0.5, 0.5), RSS, method = "L-BFGS-B", lower = c(0, 0), upper = c(1, 1)) 

Opt_par <- setNames(Opt$par, c("beta", "gamma"))
Opt_par


t <- 1:190 # time in days



fit <- data.frame(ode(y = init, times = t, func = SIR, parms = Opt_par))

In this code, we want to estimate beta and gamma and then solve the ode with these values.

My question is the infected and recovered data are not used for the estimation of the beta and gamma except the first value of infected.
Wouldnt be more sufficient if we included all the infected data for the optimization of beta and gamma?

Best Answer

The statement

My question is the infected and recovered data are not used for the estimation of the beta and gamma except the first value of infected.

is incorrect. In the loss function (squared loss) definition, the code uses all infected data to determine the best $\beta$ and $\gamma$.

sum((Infected - fit)^2)

From fitting perspective, it should be fine, because we only ave very few parameters to fitting (in this case, just two parameters) and we have a lot of data (number of infections per day). It is more like a over determined system.

Related Question