Solved – Relation and difference between time series and regression

box-jenkinsregressiontime series

What are relation and difference between time series and regression?

For models and assumptions, is it correct that the regression models assume independence between the output variables for different values of the input variable, while the time series model doesn't? What are some other differences?

For methods, from a website by Darlington

There are a number of approaches to time series analysis, but the two best known are the regression method and the Box-Jenkins (1976) or ARIMA (AutoRegressive Integrated Moving Average) method. This document introduces the regression method. I consider the regression method far superior to ARIMA for three major reasons

I don't quite understand what the "regression method" for time series is on the website, and how it is different from the Box-Jenkins or ARIMA method. I appreciate if someone can give some insights on those questions.

Thanks and regards!

Best Answer

I really think this is a good question and deserves an answer. The link provided is written by a psychologist who is claiming that some home-brew method is a better way of doing time series analysis than Box-Jenkins. I hope that my attempt at an answer will encourage others, who are more knowledgeable about time series, to contribute.

From his introduction, it looks like Darlington is championing the approach of just fitting an AR model by least-squares. That is, if you want to fit the model $$z_t = \alpha_1 z_{t-1} + \cdots + \alpha_k z_{t-k} + \varepsilon_t$$ to the time series $z_t$, you can just regress the series $z_t$ on the series with lag $1$, lag $2$, and so on up to lag $k$, using an ordinary multiple regression. This is certainly allowed; in R, it's even an option in the ar function. I tested it out, and it tends to give similar answers to the default method for fitting an AR model in R.

He also advocates regressing $z_t$ on things like $t$ or powers of $t$ to find trends. Again, this is absolutely fine. Lots of time series books discuss this, for example Shumway-Stoffer and Cowpertwait-Metcalfe. Typically, a time series analysis might proceed along the following lines: you find a trend, remove it, then fit a model to the residuals.

But it seems like he is also advocating over-fitting and then using the reduction in the mean-squared error between the fitted series and the data as evidence that his method is better. For example:

I feel correlograms are now obsolescent. Their primary purpose was to allow workers to guess which models will fit the data best, but the speed of modern computers (at least in regression if not in time-series model-fitting) allows a worker to simply fit several models and see exactly how each one fits as measured by mean squared error. [The issue of capitalization on chance is not relevant to this choice, since the two methods are equally susceptible to this problem.]

This is not a good idea because the test of a model is supposed to be how well it can forecast, not how well it fits the existing data. In his three examples, he uses "adjusted root mean-squared error" as his criterion for the quality of the fit. Of course, over-fitting a model is going to make an in-sample estimate of error smaller, so his claim that his models are "better" because they have smaller RMSE is wrong.

In a nutshell, since he is using the wrong criterion for assessing how good a model is, he reaches the wrong conclusions about regression vs. ARIMA. I'd wager that, if he had tested the predictive ability of the models instead, ARIMA would have come out on top. Perhaps someone can try it if they have access to the books he mentions here.

[Supplemental: for more on the regression idea, you might want to check out older time series books which were written before ARIMA became the most popular. For example, Kendall, Time-Series, 1973, Chapter 11 has a whole chapter on this method and comparisons to ARIMA.]

Best Answer

Related Solutions

Solved – Moving-average model error terms

Solved – How to check if the time series data is zero mean, stationary and independent identically distributed

Related Question