Solved – STL on time series with missing values for anomaly detection

missing dataoutliersrtime series

I am trying to detect anomalous values in a time series of climatic data with some missing observations. Searching the web I found many available approaches. Of those, stl decomposition seems appealing, in the sense of removing trend and seasonal components and studying the remainder. Reading STL: A Seasonal-Trend Decomposition Procedure Based on Loess, stl appears to be flexible in determining the settings for assigning variability, unaffected by outliers and possible to apply despite missing values. However, trying to apply it in R, with four years of observations and defining all the parameters according to http://stat.ethz.ch/R-manual/R-patched/library/stats/html/stl.html , I encounter error:

"time series contains internal NAs" (when na.action=na.omit), and
"series is not periodic or has less than two periods" (when na.action=na.exclude).

I have double checked that the frequency is correctly defined. I have seen relevant questions in blogs, but didn't find any suggestion that could solve this. Is it not possible to apply stl in a series with missing values? I am very reluctant to interpolate them, as I do not want to be introducing (and consequently detecting…) artifacts. For the same reason, I do not know how advisable it would be to use ARIMA approaches instead (and if missing values would still be a problem).

Please share if you know a way to apply stl in a series with missing values, or if you believe my choices are methodologically not sound, or if you have any better suggestion. I am quite new in the field and overwhelmed by the heaps of (seemingly…) relevant information.

Best Answer

ARIMA models easily incorporate dummy variables to deal with missing values. These are called Pulse Indicators . The methodology is straightforward and documented in http://www.unc.edu/~jbhill/tsay.pdf. In general the method extracts from the current residual series information regarding Pulses, Level Shifts, Seasonal Pulses and Local Time Trends.

Related Question