Solved – raw data stationary but still can see trend and seaonality is stl

armaseasonalitystationarity

So I am looking at unit sales data. I am doing a univariate time series analysis. My data is weekly sales numbers figures, spanning 2012- 2014 (obviously no till end 2014). I first ploted my response var against time.
enter image description here
I suspected trend and seasonality which would make sense. So I did a seasonal trend decompose with loess (STL). Here is the result.
enter image description here
However when I did a stationarity test on my original dataset, it gives me stationarity. And I tried Box-Ljung Philips-Perron unit root test augmented dicky fuller test with stationary and explosive as alternative both and kpss. Stationary. So why is it that my trend and sesonality not significant?
I assigned to every observation a month index and did boxplot(sales~month, data=???)
Here is the result:
enter image description here
Then asked myself if differences in mean were significant, they were not. I tried aov pairwise.t.test and TuckeyHD. even if difference in mean for some means was important, overall not. Could I say then that no significant trend and seasonality appears in the dataset and model with an ARIMA? Also I have only 156 observations in my dataset.
enter image description here
Here is the acf and pacf what ARMA would you specify? if ARMA at all.
I am really at loss it is the first time I see a staionary raw sales figure.

Best Answer

This is what I get using the R package forecast for ARIMA model selection and tsoutliers for detection of possible additive outliers, level shifts or temporary changes.

These are the data:

x <- structure(c(1291, 2497, 2643, 2633, 2637, 3246, 2764, 2552, 2190, 2423, 2393, 2278, 1573, 792, 1826, 1819, 1610, 1344, 1762, 1917, 1920, 1804, 1936, 1912, 2000, 1742, 2120, 2483, 2475, 2252, 2061, 2145, 1901, 2223, 1934, 1439, 1963, 1879, 1587, 2467, 2852, 3811, 3154, 2428, 2881, 2625, 2416, 2538, 1947, 2526, 2421, 1277, 1334, 
2009, 2693, 2624, 2714, 2882, 3044, 2847, 2439, 2690, 2677, 2461, 1495, 2501, 2614, 2486, 2390, 1901, 2315, 2710, 2436, 2184, 2079, 2234, 2382, 2213, 2506, 2673, 2579, 2341, 2246, 2331, 1670, 2263, 2453, 2411, 2653, 2754, 2818, 2635, 2577, 2897, 2796, 2039, 2188, 
2749, 2790, 2510, 2128, 2839, 2270, 1791, 1373, 2078, 2878, 3198, 3095, 2964, 3230, 3122, 2809, 2902, 3048, 2984, 2927, 2740, 2732, 1572, 2829, 1831, 2821, 2907, 2973, 2690, 2511, 2632, 2502, 2590, 2907, 3046, 2953, 2663, 2613, 2637, 2061, 2812, 2660, 2653, 2708,  2921, 2862, 2526, 2863, 3002, 3227, 2254, 2606, 2959, 2963, 3302), 
.Tsp = c(2012, 2014.90384615385, 52), class = "ts")

Fit an ARIMA model:

require(tsoutliers)
res <- tso(x, args.tsmethod = list(ic="bic"))
res
# output
# Series: x 
# ARIMA(2,0,0) with non-zero mean 
# Coefficients:
#          ar1      ar2  intercept       TC6       AO14      LS40       AO65
#       0.6172  -0.2038  1960.4325  1024.713  -885.4593  494.7448  -932.6690
# s.e.  0.0836   0.0846    83.7532   296.303   252.3412  102.8566   250.3566
#          LS107       AO120      AO122
#       370.7175  -1219.8245  -987.1700
# s.e.   95.7657    252.4097   252.4363    
# sigma^2 estimated as 88411:  log likelihood=-1081.49
# AIC=2184.99   AICc=2186.87   BIC=2218.25    
# Outliers:
#   type ind    time coefhat  tstat
# 1   TC   6 2012:06  1024.7  3.458
# 2   AO  14 2012:14  -885.5 -3.509
# 3   LS  40 2012:40   494.7  4.810
# 4   AO  65 2013:13  -932.7 -3.725
# 5   LS 107 2014:03   370.7  3.871
# 6   AO 120 2014:16 -1219.8 -4.833
# 7   AO 122 2014:18  -987.2 -3.911

The data are fitted by a stationary AR(2) model with some outliers summarized in the output. According to this model, a cyclic behaviour rather than seasonality is observed in the data.

These are the residuals from the fitted model.

-562.694; 861.332; 215.01; 360.635; 400.555; -19.663; 63.132; 89.367; -159.542; 310.001; 101.865; 
80.454; -540.365; -10.707; -60.819; -131.584; -301.359; -436.549; 105.306; -50.281; -56.658; -142.147; 
62.602; -66.119; 63.857; -253.158; 302.135; 379.365; 224.416; 80.363; 25.394; 181.858; -152.887; 
336.83; -200.61; -451.631; 318.987; -189.269; -322.656; 225.7; 313.434; 1113.322; -57.1; -182.213; 
584.986; -98.521; -57.223; 141.604; -567.276; 401.332; -181.432; -1142.653; -401.002; 5.72; 284.743; 
-68.867; 203.088; 301.483; 378.136; 115.386; -138.022; 324.644; 73.6; -83.233; 14.096; 63.987; 124.937;
-57.862; -51.839; -507.671; 188.566; 228.418; -205.009; -207.419; -212.721; -44.265; -13.322; -242.081;
185.378; 137.111; -0.256; -146.214; -113.48; -18.343; -751.16; 267.111; -43.557; -81.991; 224.644; 
167.73; 218.705; 16.786; 84.769; 403.277; 92.964; -536.499; 59.121; 373.918; 99.044; -91.952; -292.79; 
596.918; -488.727; -471.684; -709.996; 155.382; 64.387; 263.095; 50.069; 47.84; 373.703; 74.842; 
-117.303; 146.866; 171.693; 36.535; 48.783; -116.079; -20.282; 6.377; 5.001; -16.583; 0.506; 82.552; 
96.052; -210.158; -201.05; -27.24; -268.39; -75.503; 160.697; 121.984; 7.788; -196.493; -86.462; 
-90.693; -691.693; 419.689; -313.173; -73.34; -44.991; 132.638; -46.613; -302.799; 229.549; 92.099; 
299.978; -783.563; 214.792; 152.29; 10.151; 418.609

$\quad$

plot(residuals(res$fit))

residuals

Autocorrelations of the residuals:

par(mfrow = c(2,1), mar = c(5,4,4,3))
tmp <- acf(residuals(res$fit), lag.max = 53, plot = FALSE)
tmp$acf[1] <- NA
plot(tmp, ylim = c(-0.4, 0.4))
pacf(residuals(res$fit), lag.max = 53, ylim = c(-0.4, 0.4))

ACF/PACF of residuals

The autocorrelations of the residuals reveal the significance of the lag of order 52 (one year lag). One way to incorporate this lag is to define it as follows:

xlag52 <- ts(c(rep(NA, 52), window(x, start = c(2013))))
tsp(xlag52) <- tsp(x)

and pass xlag52 through argument xreg in function tso or arima. However, probably because the sample is small for such lag, the algorithm run into difficulties when fitting an ARIMA model.

Edit

The idea of seasonal dummies mentioned by @IrishStat is much more appropriate than trying to add a lag of order 52. You can use these functions of the forecast package: seasonaldummies to generate seasonal dummies or fourier to generate the trigonometric representation of seasonal dummies. For illustration and as a complement to the results already shown by @IrishStat, below I show what I get using the first 10 components of the seasonal cycles.

seas <- fourier(x, K=10)
trend <- seq_along(x)
mo <- outliers(c("TC", rep("AO", 5)), c(2, 14, 42, 65, 120, 148))
outl <- outliers.effects(mo, length(x))
seas <- fourier(x, K=10)
trend <- seq_along(x)
res2 <- arima(x, order = c(1,0,0), xreg = cbind(seas, trend, outl))

This yields the following residuals

plot(residuals(res2))

residuals from second model

par(mfrow = c(2,1), mar = c(5,4,4,3))
acf(residuals(res2), lag.max = 52)
pacf(residuals(res2), lag.max = 52)

ACF/PACF of residuals of second model

Now, we can see that the autocorrelations of order 52 are less prominent in the residuals.

The following estimate of the seasonal pattern is obtained:

fitted.seas <- ts(seas %*% coef(res2)[3:22])
tsp(fitted.seas) <- tsp(x)
plot(fitted.seas, type = "l", main = "estimated seasonal component")

estimated seasonal component

Finally, as you used R for the results shown in you question, if you want to work with the model proposed by @IrishStat, you can do the following:

mo <- outliers(rep("AO", 5), c(14, 65, 42, 39, 120))
outl <- outliers.effects(mo, length(x))
seas <- matrix(0, nrow = length(x), ncol = 52)
for (i in seq(52))
  seas[seq(i, length(x), 52),i] <- 1
trend <- seq_along(x)
res3 <- arima(x, order = c(1,0,0), xreg = cbind(seas[,-c(1,18,33)], trend, outl))
Related Question