Solved – Choosing the right forecast model for exponential data (COVID19) forecast package R

forecastingr

I am trying to forecast aggregated daily COVID cases in Europe. These are present day numbers in Italy.

temp <- c(0    , 0    , 0  ,   0  ,   0   ,  0   ,  0  ,   0 ,    0,     2,
 2    , 2   ,  2 ,    2   ,  2 ,    2   ,  3  ,   3 ,    3,     3,
 3  ,   3  ,   3   ,  3  ,   3   ,  3,     3 ,    3   ,  3  ,   3,
20   , 62  , 155 ,  229 , 322  , 453   ,655  , 888,  1128 , 1694,2036 , 
2502 ,3089 , 3858,  4636 , 5883 , 7375,  9172, 10149, 12462,12462)

My problem is that all the models underestimates the exponential growth patterns as this one with exponential smoothing. (if I try to predict using data until 4636 value, the different models estimates 8-9,0000 when the real number was 12,462). I have tried transformations, different models etc.

library(data.table)
library(tidyverse)
library(forecast)
library(lubridate)

COVfirst <- min(which(temp > 0))+22 #starts 22 day in january


temp2 <- ts(temp, start = c(2020, 22), 
            frequency = 365.25)

temp2 %>% autoplot

test <- ets(temp2,
            allow.multiplicative.trend =TRUE)


test %>% forecast(., h = 14) %>% autoplot()


ts_Italy_confirmed <- temp2
forecast_italy_Confirmed <- test %>% forecast(., h = 14)

I a little confounded by this, because the development until present day is actually pretty straight forward (exponential). I don't like fitting a exponential regression model as this will not catch up when the exponential part of the epidemic stops. (I think)

Best Answer

You can force ets() to use a model with multiplicative trend (and multiplicative error) by using the parameter model="MMN". Of course, you need to start the series later, since multiplicative trends and errors don't make sense for zero values.

temp3 <- ts(temp[-(1:9)], start = c(2020, 32), 
            frequency = 365.25)
test <- ets(temp3,model="MMN")
test %>% forecast(., h = 14) %>% autoplot()

I certainly hope this graphic is what you wanted.

It also illustrates why ets() is very careful about fitting multiplicative trends on its own. They can and will explode. Also:

I don't like fitting a exponential regression model as this will not catch up when the exponential part of the epidemic stops.

Of course, ets() will not know when to stop extrapolating the exponential growth, so this (extremely correct) rationale applies equally to ets(). You may want to consider models that are explicitly tailored towards epidemiology or (market) penetration, like the Bass diffusion model or similar.

EDIT: Rob Hyndman explains in more depth why smoothing and similar models do not make a lot of sense to forecast COVID-19, and gives pointers to more appropriate models. And here is Ivan Svetunkov.

Related Solutions

Solved – Conditional model using function tslm in R package forecast

You can do this by setting up the seasonal factors yourself. I'm assuming you have hourly data over three weeks, and that each week has 7 days.

x <- ts(rnorm(21*24),f=24)
dow <- rep(rep(1:7,rep(24,7)),3)
business.dummy <- (dow<=5)
seasons <- cycle(x)
seasons[!business.dummy] <- seasons[!business.dummy] + 24
seasons <- factor(seasons,levels=1:48,
    labels=c(paste("Week",1:24),paste("Weekend",1:24)))
fit <- tslm( x ~ seasons - 1)

The seasons factor has 48 levels, the first 24 corresponding to weekday hours, and the second 24 corresponding to weekend hours. You can generalize to allow other non-business days by setting the relevant values of business.dummy to FALSE.

Solved – Choosing the right size of an out of sample data

Not really my area of expertise but I think one answer should be “Nothing!” You could of course try to improve the model or try other techniques (but if you begin tuning the model on the basis of its performance in the test set, you still run the risk of “overfitting”) but changing the size of the test set does not seem to directly address this problem.

If today was still 2011 and we were trying to predict electricity consumption until 2013, this model would give us some seriously misleading predictions. This is precisely the type of things out-of-sample evaluation is supposed to pick up. You can look at it retrospectively today and interpret it as a trend that started last year because of some change to the electricity market but the conclusion remains the same: This model did not allow you to see it coming.

Also, if you read the sentence carefully you will notice that Rob Hyndman also stresses that size of the test set should depend on how far ahead you want to forecast. Intuitively, if you want to predict 24 months, a test set of 7 months is too short, no matter whether you have 100 or 10000 months of past data. For example, a good model of seasonal changes in your data could look very good even if it is unable to predict any year-on-year trend.

Best Answer

Related Solutions

Solved – Conditional model using function tslm in R package forecast

Solved – Choosing the right size of an out of sample data

Related Question