Solved – “Frequency” value for seconds/minutes intervals data in R

arimamapemultiple-seasonalitiesrtime series

I'm using R(3.1.1), and ARIMA models for forecasting.
I would like to know what should be the "frequency" parameter, which is assigned in the ts() function, if im using time series data which is:

  1. separated by minutes and is spread over 180 days (1440 minutes/day)
  2. separated by seconds and is spread over 180 days (86,400 seconds/day).

If I recall right the definition, a "frequency" in ts in R, is the number of observations per "season".

Question part 1:

What is the "season" in my case?

If the season is "day", then is the "frequency" for minutes = 1440 and 86,400 for seconds?

Question part 2:

Could the "frequency" also depend on what I am trying to achieve/forecast?
for example, in my case, I'd like to have a very short-term forecast.
One-step ahead of 10minutes each time.
Would it then be possible to consider the season as an hour instead of a day?
In that case frequency= 60 for minutes, frequency = 3600 for seconds?

I've tried for example to use frequency = 60 for the minute data and got better results compared to frequency = 1440 (used fourier see link below by Hyndman)
http://robjhyndman.com/hyndsight/forecasting-weekly-data/

(The comparison was made by using MAPE for the measure of forecast accuracy)

In case the results are complete arbitrary, and the frequency cannot be changed.
What would be actually the interpretation of using freq = 60 on my data?

I also think it's worth mentioning that my data contains seasonality at every hour and every two hours (by observing the raw data and the Autocorrelation function)

Best Answer

The "frequency" is the number of observations per "cycle" (normally a year, but sometimes a week, a day, an hour, etc). This is the opposite of the definition of frequency in physics, or in Fourier analysis, where "period" is the length of the cycle, and "frequency" is the inverse of period. When using the ts() function in R, the following choices should be used.

Data      frequency
Annual     1
Quarterly  4
Monthly   12
Weekly    52

Actually, there are not 52 weeks in a year, but 365.25/7 = 52.18 on average. But most functions which use ts objects require integer frequency.

Once the frequency of observations is smaller than a week, then there is usually more than one way of handling the frequency. For example, data observed every minute might have an hourly seasonality (frequency=60), a daily seasonality (frequency=24x60=1440), a weekly seasonality (frequency=24x60x7=10080) and an annual seasonality (frequency=24x60x365.25=525960). If you want to use a ts object, then you need to decide which of these is the most important.

An alternative is to use a msts object (defined in the forecast package) which handles multiple seasonality time series. Then you can specify all the frequencies that might be relevant. It is also flexible enough to handle non-integer frequencies.

You won't necessarily want to include all of these frequencies --- just the ones that are likely to be present in the data. As you have only 180 days of data, you can probably ignore the annual seasonality. If the data are measurements of a natural phenomenon (e.g., temperature), you might also be able to ignore the weekly seasonality.

With multiple seasonalities, you could use a TBATS model, or Fourier terms in a regression or ARIMA model. The fourier function from the forecast package will handle msts objects.