First, you are supposed to supply raw forecast errors to the Diebold-Mariano test function dm.test
. However, you are supplying squared forecast errors (in the text part above the separating line).
Second, the choice of power is entirely due to the loss function, as you noted. It is only you who knows your loss function. Suppose you lose $x$ dollars if the forecast error is $x$. Then your loss function is linear and you should use the option power=1
. On the other hand, your pain may be growing quadratically such that you lose $x^2$ dollars when the forecast error is $x$. Then you should use power=2
. If you are unsure about your own loss function, you may ask another question at this site giving the context of your application. But since at one point you say that you are using RMSE as the forecast accuracy measure, it may be sensible to use power=2
to be consistent.
Third, $p$-value tells us how likely you are to observe a difference in the losses (due to the forecast errors) that is at least as large as the one currently observed if the losses (due to the forecast errors) were actually equal in population. Sorry for such a long sentence.
Finally, I would not be comfortable with an approach like I want to underpin statistically that model 2 has a better accuracy. Shouldn't you care about finding out the truth as much as the available data and the statistical methods can help you? If model 1 was better than model 2 in reality, wouldn't you want to learn that? It may be tempting to abuse statistics to obtain a result you are wishing for, but... But perhaps I am misinterpreting you.
The answer to this question can be found in a later question here: Diebold-Mariano test for multiple prediction horizons
We cannot compare forecasts for VECTORS of differing (in this case increasing) points in time, we must use a rolling window approach with a fixed h to compare these two models' h-point forecasts along time
Best Answer
It depends on what you want to learn from the test result.
If you wonder whether one forecast (say, $f_1$) is statistically more accurate than another (say, $f_2$), the Diebold-Mariano (DM) test will tell you that. At this point there is no talk of the models that generated the forecasts.
The DM test (as any other statistical test) targets making inference on the population rather than the current sample. If (1) the DM test tells you with 95% confidence that $f_1$ beats $f_2$ and (2) the forecast generating processes and the data generating process all remain unchanged in the future, then you would expect that $f_1$ will beat $f_2$ also in the future.
How can you benefit from this result? If you have the forecast generating process available, you could choose to use the one for $f_1$ rather than for $f_2$. However, Diebold (2015) does not encourage that:
If you wonder which of the alternative models is more likely to have generated the data, using the DM test will be problematic in case of nested models, as explained in Clark & McCracken (2001). (Once again, Diebold did not intend the test to be used for comparing models -- see the quote above.)
How bad does the DM fail in this sense? There are simulation results reported in the tables of Clark & McCracken (2001), you may check them.
References:
Free version here.
Free version here.