Solved – Using non-stationary time series data in OLS regression

ardlleast squaresmacroeconomicsstationaritytime series

I am using 1983-2008 annual data to test if both Gini coefficients and gross national saving in China and the US can affect the US current account balance. The data seem to be non-stationary, but I am a beginner and only know the basic multiple regression model and autoregressive distributed lag model, can I still use them to these models to test the effects? I know the models would be biased and not accurate, but do they give any useful information? My chosen control variables are real GDP, interest rate, dollar index and maybe some other national income components.

Best Answer

You can do anything you want, especially if it's a term paper or something of that nature.

To obtain useful results you can't use nonstationary data with OLS and time series. There are other more advanced methods where nonstationarity is a non issue. With OLS you have to difference real GDP and indices, and also apply log transform in many cases.

UPDATE: when using non stationary variables in OLS you run into the potentially fatal issue of spurious regression, there's a ton of literature on this subject. Basically, your regression results will turn out garbage in most cases. You may see very significant coefficients, but the significance is artificial, and disappears when you run a proper regression.

There's even more subtle phenomenon called "cointegration", but since you're working on undergrad paper, I would not worry about it. As a matter of fact, if your major is not statistics or econometrics, I would imagine your instructor will not penalize you for improper use of regressions.

Clarification: you can use non-stationary data with OLS if the series are cointegrated. However, when doing so you better show that the series are cointegrated indeed, then adjust the parameter covariance matrix accordingly if you need inference. The parameters themselves would be fine. As I mentioned in original answer this is advanced concepts that are usually outside undegrad courses.

Related Solutions

Solved – Steps to perform time series analysis

It sounds like you want to fit an ARIMAX model to your time series. I would try to fit an ADL (auto-regressive distributed lag) model, an ECM (error correction model) or apply the Engle-Granger 2-step analysis to the series to see if your series cointegrate and to estimate the long-run relationship between them in case they do. If they do not cointegrate then continue with the ARIMAX model or estimate stationary ADL or ECM models. Note that an ADL model and the ARIMAX model are very similar. Although cointegration analysis with several variables is quite an endeavour and fills up entire text books (see e.g. Katarina Juselius' “The Cointegrated VAR Model: Methodology and Application) cointegration analysis with only two variables is quite fast and easy depending on what approach you want to use. Note that a part of my answer is the same as I answered in another question on a similar question. I will outline the steps you should follow in order to model the time series appropriately.Remember firstly that there are different kinds of non-stationarity and different ways on how to deal with them. Four common ones are:

1) Deterministic trends or trend stationarity. If your series is of this kind de-trend it or include a time trend in the regression/model. You might want to check out the Frisch–Waugh–Lovell theorem on this one.

2) Level shifts and structural breaks. If this is the case you should include a dummy variable for each break or if your sample is long enough model each regimé separately.

3) Changing variance. Either model the samples separately or model the changing variance using the ARCH or GARCH modelling class.

4) If your series contain a unit root. In general you should then check for cointegrating relationships between the variables but since you are concerned with univariate forecasting you shoud difference it once or twice depending on the order of integration.The steps to model the series:

1) Look at the ACF and PACF together with a time series plot to get an indication on wheter or not the series is stationary or non-stationary. If the ACF decays very slowly and the TS plot looks like it exhibiting a unit root (not mean reverting) then this is a good indication that the series do not include a unit root.

2) Test the series for a unit root. This can be done with a wide range of tests, some of the most common being the ADF test, the Phillips-Perron (PP) test, the KPSS test which has the null of stationarity or the DF-GLS test which is the most efficient of the aforementioned tests. NOTE! That in case your series contain a structural break these tests are biased towards not rejecting the null of a unit root. In case you want to test the robustness of these tests and if you suspect one or more structural breaks you should use endogenous structural break tests. Two common ones are the Zivot-Andrews test which allows for one endogenous structural break and the Clemente-Montañés-Reyes which allows for two structural breaks. The latter allows for two different models. An additive outlier model which accounts for sudden changes in the slope of the series and an innovative outlier model which takes gradual changes into account and allows a break in the intercept and slope. Look these tests up on Wikipedia or in some econometrics text book. Some statistical packages have these tests built in which makes conducting a battery of unit root test on your series very easy.

In case your series contain a unit root then test the first differences of your series in orer to see if they contain a second unit root.

3) In case your series are non-stationary then you should:

        A) Apply the Engle-Granger 2-step procedure

        B) Apply an ADL model

        C) Apply an ECM model

Note that you could use the Johansen cointegration test or some other tests but for simplicity these are left out and in your case where you only have two time series either one of A), B) and C) will suffice. Note that although the Engle-Granger procedure is easier to apply (at least I think so) the ADL/ECM estimators are prefferable as can be seen by conducting a Monte Carlo simulation.

I will not explain all these approaches and how to derive the long-run solution as that would take a considerately amount of time and space but here is an excellent link in order to introduce these methods:

http://www.econ.ku.dk/metrics/Econometrics2_07_I/LectureNotes/Cointegration.pdf

4) The amount of lags you include should be picked so that you eliminate all residual autocorrelation when picking lags for your ADL model.

5) After your cointegration analysis you are more or less done. Please note that in case you want to expand your model to several variables you should use the CVAR model and the analysis gets a lot more complicated as mentioned above.

6) In case your variables do not cointegrate but contain a unit root then continue with your ARIMAX modelling

        A) Difference the series

        B) Choose lag length according to the ACF and PACF. Pick the best model according to the AIC, BIC or HQ criterions and test for residual autocorrelation using the Ljung-Box Q test. Test the significance of your variables.

        C) Estimate and ADL/ECM model to your data. Include lags so to remove serial correlation and do tests on variable significance.

7) In case of stationary variables estimate a stationary ADL/ECM model for your data or proceed with your ARIMAX. Same steps as in 6). An excellent introductionary note on the stationary models can be found here: http://www.econ.ku.dk/metrics/Econometrics2_07_I/LectureNotes/dynamicmodels.pdfIn case your series contain a unit root with a drift or no unit root but a deterministic trend you can add a time trend to your specification. Further, check the first differences of the series and the time series plots to see if your series contain a structural break and/or outliers and include dummy variables for these. Note that you should test for structural breaks, see point 2) above. Another alternative is the Chow test. Thirdly it could be an idea to take natural logs of your variables as this will stabilize the variance of the series. The log transformation will not change anything as its a monotonic transformation.

Hopefully this made some sense. Please note that this was a very short introduction and that this could easily fill several chapters in a textbook. I will strongly recommend to read those two lectur notes I posted links to or that you get hold of a textbook on time series analysis/econometrics. If you need help to understand some of the concepts better then please feel free to ask! Model specifications and examples are all included in the lecture notes I linked to.

Solved – Correlation between monthly and quarterly data

Go back to basics and ask, "What is a correlation?" The right answer is that there are many measures of "correlation" when broadly considered as pairwise metrics for demonstrating association between variables. The naive answer is that it's a Pearson correlation since that's the most commonly taught and known form. But Pearson correlations only measure pairwise, linear association and, moreover, have a rigorous requirement for interval or ratio scaled data. Spearman correlations, on the other hand, measure monotonic association, e.g., between ordinally scaled variables. For financial data which don't always meet the requirements of a Pearson, Spearman correlations are a much more sensible metric.

There are also many, many measures of association for categorically scaled data, as well as nonlinear measures of association such as "distance" correlations, and so on.

In addition, both Pearson and Spearman correlations range between $-1$ and $1$. Given that (and Street vernacular notwithstanding), it's completely erroneous to speak of them in percentage terms.

Not knowing where you got this rule of thumb of a 12 period lag resulting in an "85%" correlation between GDP and PMI, what is your goal? In other words, why are you even interested in replicating such a hoary convention? Moreover, you have such a long time series from Quandl -- back to 1950 -- what is it that makes you so concerned about the difference in the units of time?

What would I do to address your question? I would merge the two series with 3 monthly periods per quarter and do a whole lot of exploratory analytics: scatterplots, time lines, etc. Based on that, next I would run both Pearson and Spearman correlations using different lags, e.g., 1 quarter up to whatever you think a reasonable maximum # of lags is. Then, I would examine where the association was maximized. That's it.

Of course there are more rigorous ways to answer the question that go beyond simple measures of pairwise association, but that's not what you're asking for.

Best Answer

Related Solutions

Solved – Steps to perform time series analysis

Solved – Correlation between monthly and quarterly data

Related Question