Solved – Time series analysis of daily temperature data in R

arimaautocorrelationforecastingrtime series

I am pretty new to the topic of time series analysis and I am trying to use the package "forecast" on daily temperature data to predict the daily temperature in the future. To be precise, I just need one day after the given time series. My data looks like this:

As you can see there is seasonality in the dataset, every 365 days the cycle repeats. Additionally, there is no trend, so the mean and variance stays approximately the same throughout the cycles. From what I can see I need a approach for a seasonal time series without a trend. I already tried to do that with a Seasonal decomposition and then an ARIMA and a SARIMA model. Here is my code:

x<-ts(dataset,start = c(2011,1,1),end = c(2020,8,31),frequency = 365)
pred<-stlf(x,h=1,method="arima")

Just to show how good it fits I visualized the model for two following years with

stlf(x,h=2*365,method="arima") %>%
  autoplot()

That is not a bad fit but it is not precise enough for my purpose. so I tried to fit a SARIMA model. First of all I try to get the optimal parameters for the SARIMA model with the auto.arima() function. The problem is that it takes very long to calculate and I am not sure if this is the right approach. I wanted to continue with the parameters put in the samira.for() function and predict the future values afterwards.

fit<-auto.arima(x)

Maybe you guys could help me find the right approach. What could I change or is this even the right way to do it for my purpose? It was difficult for me to create a reproducible example but maybe you can also help me like that.

Thanks in advance!

Best Answer

ARIMA takes a long time to fit for time series with "long" seasonal cycles. It is good for quarterly data (4 periods to a cycle) or monthly data (12 periods to a cycle) - but as you found, it struggles with daily data and yearly seasonality (365.25 periods to a cycle).

An STL forecast is already a very good approach, and I would consider it a useful benchmark. It is a common finding in time series forecasting that very simple benchmarks are often surprisingly hard to improve on.

One potential approach would be to use harmonics as predictors, with periods equal to the length of a year (and half a year, and a third, ...). Feed these into the xreg parameter of auto.arima() to run a regression with ARIMA errors.

That is not a bad fit but it is not precise enough for my purpose.

Sometimes our requirements on forecast accuracy are simply too high and cannot be met. If my purpose is to win big at roulette, then a hit probability of 1/37 is also not precise enough - but there is nothing I can do about it. You may find How to know that your machine learning problem is hopeless? amusing reading. At some point, it is more useful to invest resources in mitigation of unavoidable forecast errors, rather than in pursuing higher accuracy.

Related Solutions

Solved – Daily Time Series Analysis

Your ACF and PACF indicate that you at least have weekly seasonality, which is shown by the peaks at lags 7, 14, 21 and so forth.

You may also have yearly seasonality, although it's not obvious from your time series.

Your best bet, given potentially multiple seasonalities, may be a tbats model, which explicitly models multiple types of seasonality. Load the forecast package:

library(forecast)

Your output from str(x) indicates that x does not yet carry information about potentially having multiple seasonalities. Look at ?tbats, and compare the output of str(taylor). Assign the seasonalities:

x.msts <- msts(x,seasonal.periods=c(7,365.25))

Now you can fit a tbats model. (Be patient, this may take a while.)

model <- tbats(x.msts)

Finally, you can forecast and plot:

plot(forecast(model,h=100))

You should not use arima() or auto.arima(), since these can only handle a single type of seasonality: either weekly or yearly. Don't ask me what auto.arima() would do on your data. It may pick one of the seasonalities, or it may disregard them altogether.

EDIT to answer additional questions from a comment:

How can I check whether the data has a yearly seasonality or not? Can I create another series of total number of events per month and use its ACF to decide this?

Calculating a model on monthly data might be a possibility. Then you could, e.g., compare AICs between models with and without seasonality.

However, I'd rather use a holdout sample to assess forecasting models. Hold out the last 100 data points. Fit a model with yearly and weekly seasonality to the rest of the data (like above), then fit one with only weekly seasonality, e.g., using auto.arima() on a ts with frequency=7. Forecast using both models into the holdout period. Check which one has a lower error, using MAE, MSE or whatever is most relevant to your loss function. If there is little difference between errors, go with the simpler model; otherwise, use the one with the lower error.

The proof of the pudding is in the eating, and the proof of the time series model is in the forecasting.

To improve matters, don't use a single holdout sample (which may be misleading, given the uptick at the end of your series), but use rolling origin forecasts, which is also known as "time series cross-validation". (I very much recommend that entire free online forecasting textbook.

So Seasonal ARIMA models cannot usually handle multiple seasonalities? Is it a property of the model itself or is it just the way the functions in R are written?

Standard ARIMA models handle seasonality by seasonal differencing. For seasonal monthly data, you would not model the raw time series, but the time series of differences between March 2015 and March 2014, between February 2015 and February 2014 and so forth. (To get forecasts on the original scale, you'd of course need to undifference again.)

There is no immediately obvious way to extend this idea to multiple seasonalities.

Of course, you can do something using ARIMAX, e.g., by including monthly dummies to model the yearly seasonality, then model residuals using weekly seasonal ARIMA. If you want to do this in R, use ts(x,frequency=7), create a matrix of monthly dummies and feed that into the xreg parameter of auto.arima().

I don't recall any publication that specifically extends ARIMA to multiple seasonalities, although I'm sure somebody has done something along the lines in my previous paragraph.

Solved – Adjusting daily time series data for the seasonal component

It's probably best to do a standard STL (Season, Trend, Level) decomposition. Here is what you'd do with dummy data:

set.seed(1)
series <- ts(sin((1:13000)*2*pi/365)+rnorm(13000),frequency=365)
foo <- stl(series,s.window="periodic")
plot(foo)

Note that "Time" here is in years, and that this algorithm already accounts for the trend, so you don't need to extract that beforehand.

Afterwards, you can extract the components from foo$time.series, for example foo$time.series[,"remainder"] for the residuals.

An alternative would be exponential smoothing or an equivalent state space model, but forecast::ets also doesn't work with frequencies greater than 24.

You may want to look through some of our earlier questions on daily time series.

Best Answer

Related Solutions

Solved – Daily Time Series Analysis

Solved – Adjusting daily time series data for the seasonal component

Related Question