Solved – Analysing some series in R – problems with TS/decompose functions

decompositionforecastingfrequencyrtime series

I am trying to analyze and forecast the European Emission Allowance (EUA) prices by using the data from January 2008 to March, 2016 and also using other series that might be related to that price, as Brent or IBEX35 prices (I will use correlations or divergences between those series).

The problem is that, in my case, I do not know how to modify the frequency correctly when using the ‘ts’ function on my series because I have taken the data from investing.com and sendeco2.com and the time frequency they use is just everyday the stock market opens, so ‘365’ should not be the correct frequency value… I have checked how many values I have for each year and the results are:

-2008: 252 values
-2009: 250 values
-2010: 252 values
-2011: 249 values
-2012: 254 values
-2013: 254 values
-2014: 255 values
-2015: 254 values
-2016 (March): 105 values

It means that my series are irregular. What i do is to store each series (EUA, Brent…etc) to a vector (total length of 2125 for each one) and I work with those variables:

BDD<-read.csv('BDD.csv',colClasses=c("Date","numeric","numeric","numeric","numeric"), header=TRUE, sep=";")
Date <- BDD[1:nrow(BDD),1];
EUA <- BDD[1:nrow(BDD),2]; 
Brent <- BDD[1:nrow(BDD),3];
IBEX35 <- BDD[1:nrow(BDD),4];
PME <- BDD[1:nrow(BDD),5]; 

I have also tested with freq=1 (but it is obviously a mistake because they are not yearly values) and i got this error:

> EUA_ts<-ts(EUA, frequency=1, start=c(2008,1), end=c(2016,3))
> EUA_decomp<-decompose(EUA_ts, type=c("additive"))
Error in decompose(EUA_ts, type = c("additive")) : 
  time series has no or less than 2 periods

Is there another possibility to decompose a series? Am I doing something wrong?

I have uploaded my database in Drive to bring out more detail, if necessary:
https://drive.google.com/open?id=0B7nP03_LfDvQZS1WNUdqbWM1X1U

Best Answer

There is an implied statement in your question that I want to clarify before giving an answer. You stated :

I have also tested with freq=1 (but it is obviously a mistake because they are not yearly values)

This implies that either data frequency is only governed by frequency per year or that all and or only yearly data have a frequency of 1. Both of these statements are false. Solar cycles have a frequency of about 11 years, so yearly solar cycle data would have a frequency of about 11. On the other hand, something like server copy errors per copy attempt may be sampled more than once per second, but still have a frequency of 1 since copy errors are a mostly random process. When determining the frequency of the series, you need to think about what is causing the seasonality and not just rely on the sample rate (though that can be a good starting point). The answer below assumes you have correctly identifies the seasonal period as 1 year.

As to your original question, there are a few ways you can deal with your data. The simplest would be to set your seasonality to the mean value of your number of samples per year. Time series do not need to have an integer seasonality, so you can make a time series in R as below:

ts(rnorm(100), frequency = 14.73)

Another option would be to add back in the missing days to the time series and use a frequency of 365 (or even better 365.24). If you have the time points for each observation, you can use the zoo package in R to make an irregular time series and then fill in the missing values.

You can make the series using:

x.Date <- as.Date("2003-02-01") + sample(1000,900) - 1
x <- zoo(rnorm(900), x.Date)
y <- ts(as.ts(x), frequency = 365.24)

The missing values can be filled using many methods, but one to consider is zoo::na.approx. From there the series can be decomposed as normal.

decompose(na.approx(y))

A few final notes and options: 1) decompose is a useful method, but you may also want to consider the stl decomposition for your data as even the decompose docs say that "stl provides a much more sophisticated decomposition." 2) You will always get an error if you try to use decompose or stl on a series with a frequency of 1. Both functions seek to separate the seasonal and trend components of the data, so if there is no seasonal component (ie frequency = 1), there is a problem. If instead you just want to separate trend from noise, you might consider using a moving average.