Solved – Forecasting with ARIMA ( Training and Test Data split)

arimaforecastingfourier transformrtime series

I have an hourly time series of the average parking occupancy with data available from September 2017 up until June 2018. I would like to use the ARIMA model with external regressors to produce a forecast for the next 24 hours. The data is available here.

The external regressors that I am using are : week days(1=Monday to 7=Sunday), average traffic and the fourier terms.

This is what I have done up until now:

1) Checked the dominant frequency/frequencies in my data using the periodogram. The output was 24 (as expected) .

> library(forecast)
> out=periodogram(Parking$AvgOccupied)
> wmax=which.max(out$spec)
> freq=1/out$freq[wmax]
> 1/out$freq[wmax]
[1] 24.02402402

2) Split my data into test and training data. Even though I already have the the data for the average parking occupancy for the month of June 2018, I am using it as Test data since I would like to check the accuracy of my model against this data.

> Parking.Train=Parking[1:6552,] # From 01 Sep 2017 to 31 May 2018
> Parking.Test=Parking[6553:7272,] # From 01 Jun 2018 to 30 Jun 2018

3) Convert the training data to a ts object.

ParkingTS=ts(Parking.Train$AvgOccupied,
             frequency=24,
             start=c(as.Date("2017-09-01"))) 
ParkingTS1=ts(Parking.Test$AvgOccupied,
             frequency=24,
             start=c(as.Date("2018-06-01"))) 

4) Fit the model with the external regressors ( this code is courtesy Dr. Rob Hyndman (https://robjhyndman.com/hyndsight/forecasting-weekly-data/)

> bestfit=list(aicc=Inf)

> for(i in 1:11) {
 ParkingARIMA=auto.arima(ParkingTS,xreg=cbind(model.matrix(~Parking.Train$WeekDay)[,-1],
                   Parking.Train$AvgTrafficFlow,
                   forecast::fourier(ParkingTS, K=i)),seasonal=F) 
  if(ParkingARIMA$aicc < bestfit$aicc)
   {
     bestfit = ParkingARIMA
   }  else break;
 }

The resulting model is ARIMA(0,1,5) with 4 Fourier Terms.

5) I would now like to forecast the average parking occupancy for the next 24 hours using the regressors in the test data. I use the model I obtained in Step 4 and the regressors in the test data(WeekDays and Traffic Flow) + Fourier terms from test data and use them as inputs in the forecast() function with h=24. Then, compute the accuracy of the forecast using the average parking occupancy in the test data.

> ParkingForecast=forecast(bestfit,xreg=cbind(model.matrix(~Parking.Test$WeekDay)[,-1],
                                             Parking.Test$AvgTrafficFlow,
                                             forecast::fourier(ParkingTS1, K=4)))
> acc=accuracy(ParkingForecast,Parking.Test$AvgOccupied)
> acc
               ME              RMSE         MAE          MPE         MAPE         MASE          ACF1
 Training set -0.005673853141 48.64258868 31.94747327 -1.531875066  8.176109728 0.5851921293 0.02495856147
 Test set     -6.410339968260 95.59476132 66.83084303 -5.812664624 17.743429782 1.2241620176            NA

QUESTIONS:

i) Is this forecasting strategy correct? Or have I missed the mark completely?

ii) Is it correct to re- estimate the Fourier terms for the test data?

NB: I am doing the above just as an experiment. I have already modelled my data using the auto.arima() function with the external regressors as week days and traffic flow (without the Fourier terms) to get a seasonal arima model : ARIMA(3,0,3)(2,1,0)[24] with the below accuracy measures

> acc1
                     ME        RMSE         MAE          MPE         MAPE        MASE             ACF1
Training set  0.01681395761 52.63164320 32.35382066 -1.284216761  8.012784474 0.592635325 -0.0009199141052
Test set     -2.47801257238 98.98536617 61.30672355 -3.091655364 15.528942136 1.122974947               NA

Best Answer

If I understand correctly, you derive your Fourier terms from data that are only available after the test period. If you assume you can use these data, you might just as well observe the actual parking data and forecast those.

Or, in other words: no, you can only use predicted future information. For instance, you are able to predict tomorrow's weekday perfectly, so there is no problem in including the weekday. As to your parking data that you want to Fourier transform: in order to get an idea of how your algorithm performs, you will need to forecast it and Fourier transform that forecast.

Finally, you might also want to look at models that capture directly, like or .