**In-sample** $R^2$ is not a suitable measure of forecast accuracy because it does not account for overfitting. It is always possible to build a complicated model that will fit the data perfectly in sample but there are no guarantees such a model would perform decently out of sample.

**Out-of-sample** $R^2$, i.e. the squared correlation between the forecasts and the actual values, is deficient in that it does not account for bias in forecasts.

For example, consider realized values

$$y_{t+1},\dotsc,y_{t+m}$$

and two competing forecasts:

$$\hat{y}_{t+1},\dotsc,\hat{y}_{t+m}$$

and

$$\tilde{y}_{t+1},\dotsc,\tilde{y}_{t+m}.$$

Now assume that

$$\tilde{y}_{t+i}=c+\hat{y}_{t+i}$$

for every $i$, where $c$ is a constant. That is, the forecasts are the same except that the second one is higher by $c$. These two forecasts will generally have different MSE, MAPE etc. but the $R^2$ will be the same.

Consider an extreme case: the first forecast is perfect, i.e. $\hat{y}_{t+i}=y_{t+i}$ for every $i$. The $R^2$ of this forecast will be 1 (which is very good). However, the $R^2$ of the other forecast will also be 1 even though the forecast is biased by $c$ for every $i$.

It is a fact from calculus that some function $f(x)$ and $cf(x)$ have the same argmin ($x$ that minimizes the function), unless $c=0$. It follows that the following all have the same argmin, thus give the same parameter estimates.

$$
\sum(y_i - \hat y_i)^2\\
\dfrac{\sum(y_i - \hat y_i)^2}{n}\\
\dfrac{\sum(y_i - \hat y_i)^2}{n-p}\\
\dfrac{\sum(y_i - \hat y_i)^2}{8}\\
$$

The first is the usual sum of squared errors. The next two are variants of mean squared error (the $n-p$ denominator has to do with getting an unbiased estimate of the variance of the error term). I made up the final one.

However, all of these give the same parameter estimates (barring numerical issues coming from doing math on a computer).

Mean squared error has the advantage of giving some sense of by how much predictions and true values differ (though this is not perfect, since it is not absolute error), and it has a relationship to the variance of the error term. Further, you do not make the value arbitrarily large by having many observations.

## Best Answer

To decide which point forecast error measure to use, we need to take a step back. Note that we don't know the future outcome perfectly, nor will we ever. So the future outcome follows a

probability distribution. Some forecasting methods explicitly output such a full distribution, and some don't - but it is always there, if only implicitly.Now, we want to have a good error measure for a

point forecast. Such a point forecast $F_t$ is our attempt to summarize what we know about the future distribution (i.e., the predictive distribution) at time $t$ using a single number, a so-calledfunctionalof the future density. The error measure then is a way to assess the quality of this single number summary.So you should choose an error measure that rewards "good" one number summaries of (unknown, possibly forecasted, but possibly only implicit) future densities.

The challenge is that different error measures are minimized by different functionals. The expected MSE is minimized by the

expected valueof the future distribution. The expected MAD is minimized by themedianof the future distribution. Thus, if you calibrate your forecasts to minimize the MAE, your point forecast will be the future median, not the future expected value, and your forecasts will be biased if your future distribution is not symmetric.This is most relevant for count data, which are typically skewed. In extreme cases (say, Poisson distributed sales with a mean below $\log 2\approx 0.69$), your MAE will be lowest for a flat zero forecast. See here or here or here for details.

I give some more information and an illustration in What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? That thread considers the mape, but also other error measures, and it contains links to other related threads.

In the end, which error measure to use really depends on your Cost of Forecast Error, i.e., which kind of error is most painful. Without looking at the actual implications of forecast errors, any discussion about "better criteria" is basically meaningless.

Measures of forecast accuracy were a big topic in the forecasting community some years back, and they still pop up now and then. One very good article to look at is Hyndman & Koehler "Another look at measures of forecast accuracy" (2006).

Finally, one alternative is to calculate full predictive densities and assess these using proper scoring-rules.