I have read that using R-squared for time series is not appropriate because in a time series context (I know that there are other contexts) R-squared is no longer unique. Why is this? I tried to look this up, but I did not find anything. Typically I do not place much value in R-squared (or Adjusted R-Squared) when I evaluate my models, but a lot of my colleagues (i.e. Business Majors) are absolutely in love with R-Squared and I want to be able to explain to them why R-Squared in not appropriate in the context of time series.
Time Series Models – Problem with Using R-squared
r-squaredregressiontime series
Related Solutions
Why create a whole new method, i.e., time series (ARIMA), instead of using multiple linear regression and adding lagged variables to it (with the order of lags determined using ACF and PACF)?
One immediate point is that a linear regression only works with observed variables while ARIMA incorporates unobserved variables in the moving average part; thus, ARIMA is more flexible, or more general, in a way. AR model can be seen as a linear regression model and its coefficients can be estimated using OLS; $\hat\beta_{OLS}=(X'X)^{-1}X'y$ where $X$ consists of lags of the dependent variable that are observed. Meanwhile, MA or ARMA models do not fit into the OLS framework since some of the variables, namely the lagged error terms, are unobserved, and hence the OLS estimator is infeasible.
one G-M assumption is that the independent variables should be normally distributed? or just the dependent variable conditional to the independent ones?
The normality assumption is sometimes invoked for model errors, not for the independent variables. However, normality is required neither for the consistency and efficiency of the OLS estimator nor for the Gauss-Markov theorem to hold. Wikipedia article on the Gauss-Markov theorem states explicitly that "The errors do not need to be normal".
multicollinearity between variables may (obviously) arise, so estimates would be wrong.
A high degree of multicollinearity means inflated variance of the OLS estimator. However, the OLS estimator is still BLUE as long as the multicollinearity is not perfect. Thus your statement does not look right.
It is obvious that even with lagged variables OLS problems arise and it is not efficient nor correct, but when using maximum likelihood, do these problems persist?
An AR model can be estimated using both OLS and ML; both of these methods give consistent estimators. MA and ARMA models cannot be estimated by OLS, so ML is the main choice; again, it is consistent. The other interesting property is efficiency, and here I am not completely sure (but clearly the information should be available somewhere as the question is pretty standard). I would try commenting on "correctness", but I am not sure what you mean by that.
(1) There is some correlation in the ordering of the observations. In this case, (part of) the reason is that the observations are ordered by Cult
(a factor indicating the cultivator of the cabbages). And because the first cultivator is mostly associated with negative residuals and the second cultivator mostly with positive residuals, this pattern will be picked up by diagnostic tests. It might look like a "trend" or like "autocorrelation" if this is all the tests look for.
(2) Linear regression itself seems to work ok. But it is important to control for Cult
and not only for HeadWt
. Possibly Date
could be relevant as well. It would also be good to check what the MASS book says about the data (my copy is in the office, hence I can't check right now).
(3) No. The Durbin-Watson is appropriate if you have correlations over "time" or some other kind of natural ordering of the observations. And even then there might be other autocorrelation tests that could be more suitable.
Best Answer
Some aspects of the issue:
If somebody gives us a vector of numbers $\mathbf y$ and a conformable matrix of numbers $\mathbf X$, we do not need to know what is the relation between them to execute some estimation algebra, treating $y$ as the dependent variable. The algebra will result, irrespective of whether these numbers represent cross-sectional or time series or panel data, or of whether the matrix $\mathbf X$ contains lagged values of $y$ etc.
The fundamental definition of the coefficient of determination $R^2$ is
$$R^2 = 1 - \frac {SS_{res}}{SS_{tot}}$$
where $SS_{res}$ is the sum of squared residuals from some estimation procedure, and $SS_{tot}$ is the sum of squared deviations of the dependent variable from its sample mean.
Combining, the $R^2$ will always be uniquely calculated, for a specific data sample, a specific formulation of the relation between the variables, and a specific estimation procedure, subject only to the condition that the estimation procedure is such that it provides point estimates of the unknown quantities involved (and hence point estimates of the dependent variable, and hence point estimates of the residuals). If any of these three aspects change, the arithmetic value of $R^2$ will in general change -but this holds for any type of data, not just time-series.
So the issue with $R^2$ and time-series, is not whether it is "unique" or not (since most estimation procedures for time-series data provide point estimates). The issue is whether the "usual" time series specification framework is technically friendly for the $R^2$, and whether $R^2$ provides some useful information.
The interpretation of $R^2$ as "proportion of dependent variable variance explained" depends critically on the residuals adding up to zero. In the context of linear regression (on whatever kind of data), and of Ordinary Least Squares estimation, this is guaranteed only if the specification includes a constant term in the regressor matrix (a "drift" in time-series terminology). In autoregressive time-series models, a drift is in many cases not included.
More generally, when we are faced with time-series data, "automatically" we start thinking about how the time-series will evolve into the future. So we tend to evaluate a time-series model based more on how well it predicts future values, than how well it fits past values. But the $R^2$ mainly reflects the latter, not the former. The well-known fact that $R^2$ is non-decreasing in the number of regressors means that we can obtain a perfect fit by keeping adding regressors (any regressors, i.e. any series' of numbers, perhaps totally unrelated conceptually to the dependent variable). Experience shows that a perfect fit obtained thus, will also give abysmal predictions outside the sample.
Intuitively, this perhaps counter-intuitive trade-off happens because by capturing the whole variability of the dependent variable into an estimated equation, we turn unsystematic variability into systematic one, as regards prediction (here, "unsystematic" should be understood relative to our knowledge -from a purely deterministic philosophical point of view, there is no such thing as "unsystematic variability". But to the degree that our limited knowledge forces us to treat some variability as "unsystematic", then the attempt to nevertheless turn it into a systematic component, brings prediction disaster).
In fact this is perhaps the most convincing way to show somebody why $R^2$ should not be the main diagnostic/evaluation tool when dealing with time series: increase the number of regressors up to a point where $R^2\approx 1$. Then take the estimated equation and try to predict the future values of the dependent variable.