It appears you are looking for spikes within intervals of relative quiet. "Relative" means compared to typical nearby values, which suggests smoothing the series. A robust smooth is desirable precisely because it should not be influenced by a few local spikes. "Quiet" means variation around that smooth is small. Again, a robust estimate of local variation is desirable. Finally, a "spike" would be a large residual as a multiple of the local variation.
To implement this recipe, we need to choose (a) how close "nearby" means, (b) a recipe for smoothing, and (c) a recipe for finding local variation. You may have to experiment with (a), so let's make it an easily controllable parameter. Good, readily available choices for (b) and (c) are Lowess and the IQR, respectively. Here is an R
implementation:
library(zoo) # For the local (moving window) IQR
f <- function(x, width=7) { # width = size of moving window in time steps
w <- width / length(x)
y <- lowess(x, f=w) # The smooth
r <- zoo(x - y$y) # Its residuals, structured for the next step
z <- rollapply(r, width, IQR) # The running estimate of variability
r/z # The diagnostic series: residuals scaled by IQRs
}
As an example of its use, consider these simulated data where two successive spikes are added to a quiet period (two in a row should be harder to detect than one isolated spike):
> x <- c(rnorm(192, mean=0, sd=1), rnorm(96, mean=0, sd=0.1), rnorm(192, mean=0, sd=1))
> x[240:241] <- c(1,-1) # Add a local spike
> plot(x)
Here is the diagnostic plot:
> u <- f(x)
> plot(u)
Despite all the noise in the original data, this plot beautifully detects the (relatively small) spikes in the center. Automate the detection by scanning f(x)
for largish values (larger than about 5 in absolute value: experiment to see what works best with sample data).
> spikes <- u[abs(u) >= 5]
240 241 273
9.274959 -9.586756 6.319956
The spurious detection at time 273 was a random local outlier. You can refine the test to exclude (most) such spurious values by modifying f
to look for simultaneously high values of the diagnostic r/z
and low values of the running IQR, z
. However, although the diagnostic has a universal (unitless) scale and interpretation, the meaning of a "low" IQR depends on the units of the data and has to be determined from experience.
It sounds like you want to fit an ARIMAX model to your time series. I would try to fit an ADL (auto-regressive distributed lag) model, an ECM (error correction model) or apply the Engle-Granger 2-step analysis to the series to see if your series cointegrate and to estimate the long-run relationship between them in case they do. If they do not cointegrate then continue with the ARIMAX model or estimate stationary ADL or ECM models. Note that an ADL model and the ARIMAX model are very similar. Although cointegration analysis with several variables is quite an endeavour and fills up entire text books (see e.g. Katarina Juselius' “The Cointegrated VAR Model: Methodology and Application) cointegration analysis with only two variables is quite fast and easy depending on what approach you want to use. Note that a part of my answer is the same as I answered in another question on a similar question. I will outline the steps you should follow in order to model the time series appropriately.Remember firstly that there are different kinds of non-stationarity and different ways on how to deal with them. Four common ones are:
1) Deterministic trends or trend stationarity. If your series is of this kind de-trend it or include a time trend in the regression/model. You might want to check out the Frisch–Waugh–Lovell theorem on this one.
2) Level shifts and structural breaks. If this is the case you should include a dummy variable for each break or if your sample is long enough model each regimé separately.
3) Changing variance. Either model the samples separately or model the changing variance using the ARCH or GARCH modelling class.
4) If your series contain a unit root. In general you should then check for cointegrating relationships between the variables but since you are concerned with univariate forecasting you shoud difference it once or twice depending on the order of integration.The steps to model the series:
1) Look at the ACF and PACF together with a time series plot to get an indication on wheter or not the series is stationary or non-stationary. If the ACF decays very slowly and the TS plot looks like it exhibiting a unit root (not mean reverting) then this is a good indication that the series do not include a unit root.
2) Test the series for a unit root. This can be done with a wide range of tests, some of the most common being the ADF test, the Phillips-Perron (PP) test, the KPSS test which has the null of stationarity or the DF-GLS test which is the most efficient of the aforementioned tests. NOTE! That in case your series contain a structural break these tests are biased towards not rejecting the null of a unit root. In case you want to test the robustness of these tests and if you suspect one or more structural breaks you should use endogenous structural break tests. Two common ones are the Zivot-Andrews test which allows for one endogenous structural break and the Clemente-Montañés-Reyes which allows for two structural breaks. The latter allows for two different models. An additive outlier model which accounts for sudden changes in the slope of the series and an innovative outlier model which takes gradual changes into account and allows a break in the intercept and slope. Look these tests up on Wikipedia or in some econometrics text book. Some statistical packages have these tests built in which makes conducting a battery of unit root test on your series very easy.
In case your series contain a unit root then test the first differences of your series in orer to see if they contain a second unit root.
3) In case your series are non-stationary then you should:
A) Apply the Engle-Granger 2-step procedure
B) Apply an ADL model
C) Apply an ECM model
Note that you could use the Johansen cointegration test or some other tests but for simplicity these are left out and in your case where you only have two time series either one of A), B) and C) will suffice. Note that although the Engle-Granger procedure is easier to apply (at least I think so) the ADL/ECM estimators are prefferable as can be seen by conducting a Monte Carlo simulation.
I will not explain all these approaches and how to derive the long-run solution as that would take a considerately amount of time and space but here is an excellent link in order to introduce these methods:
http://www.econ.ku.dk/metrics/Econometrics2_07_I/LectureNotes/Cointegration.pdf
4) The amount of lags you include should be picked so that you eliminate all residual autocorrelation when picking lags for your ADL model.
5) After your cointegration analysis you are more or less done. Please note that in case you want to expand your model to several variables you should use the CVAR model and the analysis gets a lot more complicated as mentioned above.
6) In case your variables do not cointegrate but contain a unit root then continue with your ARIMAX modelling
A) Difference the series
B) Choose lag length according to the ACF and PACF. Pick the best model according to the AIC, BIC or HQ criterions and test for residual autocorrelation using the Ljung-Box Q test. Test the significance of your variables.
C) Estimate and ADL/ECM model to your data. Include lags so to remove serial correlation and do tests on variable significance.
7) In case of stationary variables estimate a stationary ADL/ECM model for your data or proceed with your ARIMAX. Same steps as in 6). An excellent introductionary note on the stationary models can be found here: http://www.econ.ku.dk/metrics/Econometrics2_07_I/LectureNotes/dynamicmodels.pdfIn case your series contain a unit root with a drift or no unit root but a deterministic trend you can add a time trend to your specification. Further, check the first differences of the series and the time series plots to see if your series contain a structural break and/or outliers and include dummy variables for these. Note that you should test for structural breaks, see point 2) above. Another alternative is the Chow test. Thirdly it could be an idea to take natural logs of your variables as this will stabilize the variance of the series. The log transformation will not change anything as its a monotonic transformation.
Hopefully this made some sense. Please note that this was a very short introduction and that this could easily fill several chapters in a textbook. I will strongly recommend to read those two lectur notes I posted links to or that you get hold of a textbook on time series analysis/econometrics. If you need help to understand some of the concepts better then please feel free to ask! Model specifications and examples are all included in the lecture notes I linked to.
Best Answer
Unit root test: ADF, Perron (lookup dfuller and pperron in STATA)
If your data is not stationary, you will have to take steps to make it stationary (such as for example taking first difference).
Cointegration
You can do Johanssen, but the test can be sometimes tricky, because it doesn't allow for any gaps in the data. The alternative to that is doing it by regression.
In STATA, this would be:
regress y x //regression of the variables
predict resid, residuals // creates variable resid, the residuals of the regression dfuller resid, lags(15) // ADF test to see if the residuals are stationary.
Structural change
With Chow test, you need to know what the break date is, then you test for it. If you don't know when the change is, I suggest the QLR test for coefficient stability.
Are you doing a VAR/SVAR? If so, you also need the lag selection criteria, so that you know how many past time periods affect the current period. For this, lookup the varsoc command in stata (for this, you need you series to be stationary based on the tests in step 1, because only stationary series can be transformed into a moving average processes).
After you select your lag, you can see the results of the VAR/SVAR model (command varbasic in STATA).
If you want to test Granger-causality, look up vargranger.
The remaining tests depend on the specifics of your data. You could test autocorrelation using the estat dwatson, estat bgodfrey in STATA if you need to (or even if you don't need to just to see).
Also, you should note that if you find a structural break (using Chow test or QLR), it is common to repeat the analysis on the two time periods (before the break and after the break), to identify what the effects were before and after the structural change.
I recommend you check out this link http://www.princeton.edu/~otorres/TS101.pdf.
Good luck!