Solved – Sales forecast with an ARIMA model

arimaforecastingtime series

I'm trying to understand if an ARIMA model could be improved.
This is my dataset (sales):

28.35, 51.89, 37.26, 48.22, 30.93, 43.54, 35.3, 59.45, 49.41, 65.61, 36.59, 51.25, 31.42, 53.16, 39.41, 64.45, 43.94, 79.36, 52.93, 74.99, 55.03, 86.93, 41.69, 62.77, 41.29, 59.95, 40.07, 66.13, 47.15, 85.12, 74.44, 76.42, 49.17, 82.66, 49.88, 70.98, 52.83, 75.85, 61.4, 85.2, 61.99, 90.68, 48.05, 74.2, 41.7, 68, 46.41, 82.23, 62.18, 88.65, 65.21, 100.9, 46.63, 83.53, 56.57, 108.87, 51.01, 80.15, 57.03, 87.91, 62.41, 96.11, 71.41, 82.08, 62.5, 88.52, 60.53, 100.15, 67.74, 111.88, 74.64, 138.64, 97.88, 153.88, 111.34, 176.4, 67.57, 111.95, 72.36, 118.85, 82.19, 136.88, 84.95, 160.58, 64.13, 111.32, 64.65, 113.82, 74.75, 118.76, 86.28, 166.36, 71.82, 119.83, 67.64, 116.17, 77.83, 130.64, 95.23, 149.84, 115.97, 189.69, 96.35, 137.51, 82.04, 139.19, 70.68, 135.22, 69.84, 105.7, 65.47, 111.47, 63.71, 108.23, 66.81, 117.96, 86.82, 141.74, 71.97, 122.65, 89.35, 133.97, 110.07, 159.18, 117.4, 196.9, 167.69, 244.75, 85.43, 135.54, 70.51, 118.3, 78.83, 139.85, 108.57, 162.66, 139.03, 203.72, 94.37, 135.92, 80.35, 128.63, 90.2, 157.56, 112.91, 177.07, 147.28, 221.67, 90.86, 142.66, 93.96, 157.89, 121.5, 200.35, 140.08, 306.36, 187.86, 171.39, 113.52, 174.2, 108.89, 170.53, 121.49, 193.65, 148.72, 210.61, 168.46, 250.4, 213.54, 181.78, 126.56, 190.46, 137.85, 226.25, 148.68, 235.04, 170.39, 275.04, 106.68, 163.24, 109.15, 186.46, 129.33, 156.18, 91.03, 159.87, 119.43, 164.51, 92.84, 145, 87.02, 156.55, 92.76, 140.93, 102.72, 143.41, 92.11, 159.72, 96.44, 156.98,
151.38, 221.12, 174.89, 242.53, 117.66, 163.44, 111.25, 169.58, 103.27, 163.09, 105.62, 186.64, 124.75, 145.65, 108.31, 165.3, 101.91, 156.55, 101.72, 147.11, 106.25, 185.68, 146.83, 192.05, 101.46, 153.65, 105.91, 170.1, 97.07, 165.05, 106.06, 167.25, 102.68, 197.21, 99.19, 169.58, 106.66, 196.44, 103.46, 165.62, 108.77, 188.32, 117.03, 241.48, 171.6, 189.78, 110.79, 166.22, 116.14, 229.75, 144.17, 205.75, 137.51, 216.51, 111.98, 186.34, 138.92, 218.35, 172.29, 271.53, 143.24, 272.35, 274.9, 232.97, 238, 234.88, 172.19, 260.82, 143.12, 217.38, 136.56, 209.91, 144.57, 253.58, 171.79, 264.78, 189.01, 298.97, 231.23, 315.29, 198.05, 318.52, 183.21, 232.33, 161.4, 261.82, 145.56, 218.09, 140.13, 215, 154.87, 293.88, 164.71, 256.85, 192.69, 306.87, 255.16, 382.27, 298.13, 438.22, 183.88, 279.56, 217.82, 371.55, 269.81, 383.89, 211.72, 330.02, 217.97, 312.64, 227.47, 329.25, 238.65, 363.8, 280.39, 453.38, 363.84, 486.65, 647.67, 534.41, 219.69, 292.16, 209.73, 336.33, 226.43, 336.23, 249.48, 359.84, 188.05, 307.73, 231.67, 330.43, 252.22, 379.3, 293.54, 413.67, 384.64, 515.86, 482.36, 438.12

sales = as.ts(sales, frequency=2)

Two data each week.
In this time series I see seasonality and trend. Correct me if I'm wrong.
After a log transformation to stabilize variance:

sales.transformed <- log(sales)

The model given by auto.arima is:

fit <- auto.arima(sales.transformed, seasonal=TRUE)
summary(fit)

    Series: sales.transformed 
    ARIMA(3,1,1) with drift         

Coefficients:
         ar1     ar2      ar3      ma1   drift
      0.2340  0.6310  -0.4893  -0.8913  0.0064
s.e.  0.0541  0.0471   0.0485   0.0407  0.0018

sigma^2 estimated as 0.03276:  log likelihood=96.98
AIC=-181.96   AICc=-181.71   BIC=-159.01

But residuals don't behave like white noise:

res <- residuals(fit)
Acf(res, main="ACF of residuals")

Then I tried a seasonal ARIMA model.

fit <- Arima(sales.transformed, order=c(3,1,1), 
             seasonal=list(order=c(3,0,1), period=2))

Things are getting better:

Series: sales.transformed 
ARIMA(3,1,1)(2,0,1)[2]                    

Coefficients:
          ar1     ar2     ar3     ma1    sar1     sar2     sma1
      -1.3075  0.2398  0.5474  0.9721  0.0648  -0.0803  -0.7839
s.e.   0.0630  0.1221  0.1010  0.0401  0.0848   0.0657   0.0606

sigma^2 estimated as 0.02667:  log likelihood=131.05
AIC=-246.09   AICc=-245.65   BIC=-215.48

Now things are even better.

ARIMA(5,1,0) with drift         
Box Cox transformation: lambda= 0 

Coefficients:
          ar1     ar2      ar3      ar4      ar5   drift
      -0.4224  0.0143  -0.2956  -0.0849  -0.3238  0.0015
s.e.   0.0516  0.0563   0.0541   0.0564   0.0516  0.0009

sigma^2 estimated as 0.001257:  log likelihood=649.72
AIC=-1285.44   AICc=-1285.1   BIC=-1258.65

enter image description here

I don't know if the model can be improved or if this is the best an ARIMA can do, given my time series. I should probably try with autoregressive mixed models and add new predictors…

Any advice would be really appreciated.

Best Answer

Since your data has an upward trend to it, it is good that your model has an upward trend. The data looks exponential, so using a log transform is a good idea.

However, it looks like your model's variance is lower than your data's variance. I would try more auto-regressive values. e.g. ARIMA(7,1,0), ARIMA(9,1,0), etc. This might help.

You could also average every 2 data points before analyzing. This would produce one data point per week and eliminate that short, regular fluctuation which is really not that interesting. (If possible...) This should produce a better forecast.

Also, check that you reversed your log transform on the model results before plotting it against the actual data. This might be the cause of the variance mis-match.

I like your idea of looking for other predictors. This would probably help with those outlier points that don't follow the underlying patterns.