Solved – Determining the order of a Box-Jenkins modeling process

arimabox-jenkinsr

I have a problem on what model class (AR, MA, ARMA, ARIMA, etc.) will I use on my data, i.e., what order (say 1,0,1) will I use, using a Box-Jenkins procedure.

I have already done many transformations on my data but the errors are so large and the correlation is somewhat small. My data are stationary (ADF test and KPSS test) but not normally distributed (Anderson-Darling, Wilk-Shapiro and Kolmogorov-Smirnov test). So I apply natural log and then test it again but it is still not normally distributed. So I differenced it once and it is now stationary and normally distributed.

I already satisfy the requirements of using a Box-Jenkins process. Then I use auto.arima in R to know what order to use and I also try SPSS using its expert modeler to cross check.

My problem is I still get large errors and small R-squared. I need to know what to do for determining order? I also have problems in understanding ACFs and PACFs.

Below is my actual data:

Harvest

60477
29323
51369
15800
58994
45496
17227
92103
138573
39181
51192
13132
400
18258
54553
7220
1418
6807
17915
89015
122154
122853
63398
27246
27013
36317
65735
94744
78763
39769
20422
27398
33552
10000
6500
5300
5700
4800
5300
6450
9300
5834
29200
39975
65000
45494
79000
7900
54758
70581
31505
45437
29691
110947
40498
71238
42170
38723
64813
122992
17929
11652
134137
110043
60153
7625
25967
38918
1621
14946
76610
84516
72223
40399
63482
34918
63098
105388
135809
31345
66880
160511
40238
35767
105560
119276
154348
86935
73728
167119
128709
97040
21780
9906
62213
99940
72626
117783
58037
68756
25721
19853
4943
2027
20251
114718
27801
80868
94761
18914
119632
187924
56950
52886
141456
141507

Harvest graph

This is the differenced data

d_Harvest

-31154
22046
-35569
43194
-13498
-28269
74876
46470
-99392
12011
-38060
-12732
17858
36295
-47333
-5802
5389
11108
71100
33139
699
-59455
-36152
-233
9304
29418
29009
-15981
-38994
-19347
6976
6154
-23552
-3500
-1200
400
-900
500
1150
2850
-3466
23366
10775
25025
-19506
33506
-71100
46858
15823
-39076
13932
-15746
81256
-70449
30740
-29068
-3447
26090
58179
-105063
-6277
122485
-24094
-49890
-52528
18342
12951
-37297
13325
61664
7906
-12293
-31824
23083
-28564
28180
42290
30421
-104464
35535
93631
-120273
-4471
69793
13716
35072
-67413
-13207
93391
-38410
-31669
-75260
-11874
52307
37727
-27314
45157
-59746
10719
-43035
-5868
-14910
-2916
18224
94467
-86917
53067
13893
-75847
100718
68292
-130974
-4064
88570
51

enter image description here

Best Answer

Your series of 116 weekly values is typical of many real-world series that we run into. Identification of a SARIMA model is impeded by the small sample size. The classical ARIMA model identification scheme premises no outliers, no level shifts, no seasonal dummies and no deterministic trends of the form 1,2,3,4,....t . Following is a suggested robust identification procedure which reflects the evolution of ARIMA models that incorporates deterministic structure defined here. Your series evidences a few seasonal deterministic dummies and a time trend and an ARIMA component of the form (1,0,0). The equation is enter image description here . There are 4 weekly indicators ; week 2 ,8 , 11 and 30 . The acf of the errors suggests randomness in the residuals enter image description here and is confirmed by enter image description herea residual plot. Model identification of the need for the 4 seasonal dummies and the time trend was performed following the work of Tsay and others including myself as I helped develop the commercially available software that was used in this analysis. http://www.unc.edu/~jbhill/tsay.pdf. A model summary is presented here .enter image description here

enter image description here