Solved – Choosing the right forecast model for exponential data (COVID19) forecast package R

forecastingr

I am trying to forecast aggregated daily COVID cases in Europe. These are present day numbers in Italy.

temp <- c(0    , 0    , 0  ,   0  ,   0   ,  0   ,  0  ,   0 ,    0,     2,
 2    , 2   ,  2 ,    2   ,  2 ,    2   ,  3  ,   3 ,    3,     3,
 3  ,   3  ,   3   ,  3  ,   3   ,  3,     3 ,    3   ,  3  ,   3,
20   , 62  , 155 ,  229 , 322  , 453   ,655  , 888,  1128 , 1694,2036 , 
2502 ,3089 , 3858,  4636 , 5883 , 7375,  9172, 10149, 12462,12462)

My problem is that all the models underestimates the exponential growth patterns as this one with exponential smoothing. (if I try to predict using data until 4636 value, the different models estimates 8-9,0000 when the real number was 12,462). I have tried transformations, different models etc.

library(data.table)
library(tidyverse)
library(forecast)
library(lubridate)

COVfirst <- min(which(temp > 0))+22 #starts 22 day in january


temp2 <- ts(temp, start = c(2020, 22), 
            frequency = 365.25)

temp2 %>% autoplot

test <- ets(temp2,
            allow.multiplicative.trend =TRUE)


test %>% forecast(., h = 14) %>% autoplot()


ts_Italy_confirmed <- temp2
forecast_italy_Confirmed <- test %>% forecast(., h = 14)

I a little confounded by this, because the development until present day is actually pretty straight forward (exponential). I don't like fitting a exponential regression model as this will not catch up when the exponential part of the epidemic stops. (I think)

Best Answer

You can force ets() to use a model with multiplicative trend (and multiplicative error) by using the parameter model="MMN". Of course, you need to start the series later, since multiplicative trends and errors don't make sense for zero values.

temp3 <- ts(temp[-(1:9)], start = c(2020, 32), 
            frequency = 365.25)
test <- ets(temp3,model="MMN")
test %>% forecast(., h = 14) %>% autoplot()

forecast

I certainly hope this graphic is what you wanted.

It also illustrates why ets() is very careful about fitting multiplicative trends on its own. They can and will explode. Also:

I don't like fitting a exponential regression model as this will not catch up when the exponential part of the epidemic stops.

Of course, ets() will not know when to stop extrapolating the exponential growth, so this (extremely correct) rationale applies equally to ets(). You may want to consider models that are explicitly tailored towards epidemiology or (market) penetration, like the Bass diffusion model or similar.

EDIT: Rob Hyndman explains in more depth why smoothing and similar models do not make a lot of sense to forecast COVID-19, and gives pointers to more appropriate models. And here is Ivan Svetunkov.