This is what I get using the R package forecast for ARIMA model selection and tsoutliers for detection of possible additive outliers, level shifts or temporary changes.
These are the data:
x <- structure(c(1291, 2497, 2643, 2633, 2637, 3246, 2764, 2552, 2190, 2423, 2393, 2278, 1573, 792, 1826, 1819, 1610, 1344, 1762, 1917, 1920, 1804, 1936, 1912, 2000, 1742, 2120, 2483, 2475, 2252, 2061, 2145, 1901, 2223, 1934, 1439, 1963, 1879, 1587, 2467, 2852, 3811, 3154, 2428, 2881, 2625, 2416, 2538, 1947, 2526, 2421, 1277, 1334,
2009, 2693, 2624, 2714, 2882, 3044, 2847, 2439, 2690, 2677, 2461, 1495, 2501, 2614, 2486, 2390, 1901, 2315, 2710, 2436, 2184, 2079, 2234, 2382, 2213, 2506, 2673, 2579, 2341, 2246, 2331, 1670, 2263, 2453, 2411, 2653, 2754, 2818, 2635, 2577, 2897, 2796, 2039, 2188,
2749, 2790, 2510, 2128, 2839, 2270, 1791, 1373, 2078, 2878, 3198, 3095, 2964, 3230, 3122, 2809, 2902, 3048, 2984, 2927, 2740, 2732, 1572, 2829, 1831, 2821, 2907, 2973, 2690, 2511, 2632, 2502, 2590, 2907, 3046, 2953, 2663, 2613, 2637, 2061, 2812, 2660, 2653, 2708, 2921, 2862, 2526, 2863, 3002, 3227, 2254, 2606, 2959, 2963, 3302),
.Tsp = c(2012, 2014.90384615385, 52), class = "ts")
Fit an ARIMA model:
require(tsoutliers)
res <- tso(x, args.tsmethod = list(ic="bic"))
res
# output
# Series: x
# ARIMA(2,0,0) with non-zero mean
# Coefficients:
# ar1 ar2 intercept TC6 AO14 LS40 AO65
# 0.6172 -0.2038 1960.4325 1024.713 -885.4593 494.7448 -932.6690
# s.e. 0.0836 0.0846 83.7532 296.303 252.3412 102.8566 250.3566
# LS107 AO120 AO122
# 370.7175 -1219.8245 -987.1700
# s.e. 95.7657 252.4097 252.4363
# sigma^2 estimated as 88411: log likelihood=-1081.49
# AIC=2184.99 AICc=2186.87 BIC=2218.25
# Outliers:
# type ind time coefhat tstat
# 1 TC 6 2012:06 1024.7 3.458
# 2 AO 14 2012:14 -885.5 -3.509
# 3 LS 40 2012:40 494.7 4.810
# 4 AO 65 2013:13 -932.7 -3.725
# 5 LS 107 2014:03 370.7 3.871
# 6 AO 120 2014:16 -1219.8 -4.833
# 7 AO 122 2014:18 -987.2 -3.911
The data are fitted by a stationary AR(2) model with some outliers summarized in the output. According to this model, a cyclic behaviour rather than seasonality is observed in the data.
These are the residuals from the fitted model.
-562.694; 861.332; 215.01; 360.635; 400.555; -19.663; 63.132; 89.367; -159.542; 310.001; 101.865;
80.454; -540.365; -10.707; -60.819; -131.584; -301.359; -436.549; 105.306; -50.281; -56.658; -142.147;
62.602; -66.119; 63.857; -253.158; 302.135; 379.365; 224.416; 80.363; 25.394; 181.858; -152.887;
336.83; -200.61; -451.631; 318.987; -189.269; -322.656; 225.7; 313.434; 1113.322; -57.1; -182.213;
584.986; -98.521; -57.223; 141.604; -567.276; 401.332; -181.432; -1142.653; -401.002; 5.72; 284.743;
-68.867; 203.088; 301.483; 378.136; 115.386; -138.022; 324.644; 73.6; -83.233; 14.096; 63.987; 124.937;
-57.862; -51.839; -507.671; 188.566; 228.418; -205.009; -207.419; -212.721; -44.265; -13.322; -242.081;
185.378; 137.111; -0.256; -146.214; -113.48; -18.343; -751.16; 267.111; -43.557; -81.991; 224.644;
167.73; 218.705; 16.786; 84.769; 403.277; 92.964; -536.499; 59.121; 373.918; 99.044; -91.952; -292.79;
596.918; -488.727; -471.684; -709.996; 155.382; 64.387; 263.095; 50.069; 47.84; 373.703; 74.842;
-117.303; 146.866; 171.693; 36.535; 48.783; -116.079; -20.282; 6.377; 5.001; -16.583; 0.506; 82.552;
96.052; -210.158; -201.05; -27.24; -268.39; -75.503; 160.697; 121.984; 7.788; -196.493; -86.462;
-90.693; -691.693; 419.689; -313.173; -73.34; -44.991; 132.638; -46.613; -302.799; 229.549; 92.099;
299.978; -783.563; 214.792; 152.29; 10.151; 418.609
$\quad$
plot(residuals(res$fit))
Autocorrelations of the residuals:
par(mfrow = c(2,1), mar = c(5,4,4,3))
tmp <- acf(residuals(res$fit), lag.max = 53, plot = FALSE)
tmp$acf[1] <- NA
plot(tmp, ylim = c(-0.4, 0.4))
pacf(residuals(res$fit), lag.max = 53, ylim = c(-0.4, 0.4))
The autocorrelations of the residuals reveal the significance of the lag of order 52 (one year lag). One way to incorporate this lag is to define it as follows:
xlag52 <- ts(c(rep(NA, 52), window(x, start = c(2013))))
tsp(xlag52) <- tsp(x)
and pass xlag52
through argument xreg
in function tso
or arima
. However, probably because the sample is small for such lag, the algorithm run into difficulties when fitting an ARIMA model.
Edit
The idea of seasonal dummies mentioned by @IrishStat is much more appropriate than trying to add a lag of order 52. You can use these functions of the forecast
package: seasonaldummies
to generate seasonal dummies or fourier
to generate the trigonometric representation of seasonal dummies. For illustration and as a complement to the results already shown by @IrishStat, below I show what I get using the first 10 components of the seasonal cycles.
seas <- fourier(x, K=10)
trend <- seq_along(x)
mo <- outliers(c("TC", rep("AO", 5)), c(2, 14, 42, 65, 120, 148))
outl <- outliers.effects(mo, length(x))
seas <- fourier(x, K=10)
trend <- seq_along(x)
res2 <- arima(x, order = c(1,0,0), xreg = cbind(seas, trend, outl))
This yields the following residuals
plot(residuals(res2))
par(mfrow = c(2,1), mar = c(5,4,4,3))
acf(residuals(res2), lag.max = 52)
pacf(residuals(res2), lag.max = 52)
Now, we can see that the autocorrelations of order 52 are less prominent in the residuals.
The following estimate of the seasonal pattern is obtained:
fitted.seas <- ts(seas %*% coef(res2)[3:22])
tsp(fitted.seas) <- tsp(x)
plot(fitted.seas, type = "l", main = "estimated seasonal component")
Finally, as you used R for the results shown in you question, if you want to work with the model proposed by @IrishStat, you can do the following:
mo <- outliers(rep("AO", 5), c(14, 65, 42, 39, 120))
outl <- outliers.effects(mo, length(x))
seas <- matrix(0, nrow = length(x), ncol = 52)
for (i in seq(52))
seas[seq(i, length(x), 52),i] <- 1
trend <- seq_along(x)
res3 <- arima(x, order = c(1,0,0), xreg = cbind(seas[,-c(1,18,33)], trend, outl))
Best Answer
See How to detect seasonality from plotted data without using tools or libraries and the link to when and why you should take logs might be of help to you. Untreated deterministic structure .. pulses/level shifts/time trends often incorrectly suggest transforms. Visually you series might have changes in trends AND/OR changes in intercepts in addition to some form of ARIMA structure . If you post your data I will try and help further.
Detrending , Power Transformations .Differencing AND ARMA are all forms of transformations. Determining the minimally sufficient (parsimonious) combination requires skillful techniques . Simple scripts i.e. hard and fast rules are to be studiously avoided as they limit the scope of the solution and often obfuscate.
I should also add there are two other forms of transformations often suggested by the data ...
The opportunity space , correctly evaluated , can often lead to a "useful model" ...ala GEPB
EDITED AFTER RECEIPT OF DATA .
Using AUTOBOX , a piece of software that I have helped to develop the following model was automatically developed. First differences with an AR(2) without lag 1 and three pulses ... two of them at the end of the series. ... using the most recent 29 values. Note that an AR(2) model does not necessarily include both lags whereas some solutions requires that to be true (auto.arima for example )
The statistics are here and here . The ACF of the residuals .
The Actual and Adjusted Plot highlights the "unusual values" . Of particular interest are the two anomalies at the tail-end of the series (information extracted from the Data) . If one were to believe these two values rather than "adjusting for them" the subsequent forecast would be higher as compared to this forecast
A possible useful model would then have no power transform , no weighted least squares BUT have differences, arma and three pulses after having segmented the data into 1-44 AND 45-73 distinct time ranges.