Solved – Diebold-Mariano test for predictive accuracy

accuracydiebold-mariano-testpredictionstatistical significance

I am using the Diebold-Mariano test in the forecast package of R to test the predictive accuracy. In particular, I want to underpin statistically that model 2 has a better accuracy. What I did is the following:

These are the squared errors from two forecast models:

squaredErrorsFromForecastModelOne <- c ( 1.08431, 0.94595, 0.81180, 0.02976, 0.74494, 0.61874, 0.50382, 0.40145, 19.72515, 0.00127, 0.00417, 1.18810, 15.98400, 2.13949, 6.95535, 0.09054, 3.25766, 5.37266, 3.97883, 3.44511, 0.50808, 2.81132, 2.33295, 385.03073, 12.52735, 58.53015, 5.54603, 18.80436, 8.54802, 20.89861, 18.24486, 15.67131, 2.68173, 12.47644, 4.84924, 3.93189, 7.65020, 5.96776, 4.52711, 3.32260, 2.34151, 0.56025, 1.45975, 1.08764, 0.00341, 17.40392, 1.36376, 0.00146, 9.75438, 0.75412, 23.33373, 0.42497, 2.01754, 0.07355, 0.58630, 18.56576, 1.36259, 0.00709, 0.79477, 0.57882, 0.13286, 1.88705, 2.99913, 2.22159, 1.89255, 5.10173, 4.12374, 3.25911, 2.51001, 1.87580, 1.35187, 0.93084, 0.60388, 0.36036)

squaredErrorsFromForecastModelTwo <- c ( 0.00000, 0.00000, 0.00000, 0.00640, 0.00005, 0.00004, 0.00003, 0.00002, 0.16288, 0.01416, 0.01439, 0.00166, 0.18857, 0.00420, 0.14018, 0.05499, 0.00593, 0.14797, 0.00691, 0.00487, 0.01922, 0.00319, 0.00225, 2.95785, 0.03410, 1.07147, 0.13653, 0.03624, 0.51828, 0.03413, 0.02408, 0.01699, 0.07263, 0.01132, 0.02868, 0.02633, 0.00582, 0.00411, 0.00290, 0.00204, 0.00144, 0.03683, 0.00170, 0.03787, 0.01525, 0.19688, 0.00529, 0.01991, 0.14327, 0.05662, 0.31346, 0.03621, 0.11550, 0.07048, 0.03349, 0.32915, 0.04077, 0.07269, 0.12005, 0.12323, 0.07556, 0.03644, 0.17670, 0.03729, 0.03320, 0.00866, 0.00611, 0.00431, 0.00304, 0.00215, 0.00151, 0.00107, 0.00075, 0.00053)

This is the output I get:

> dm.test(errLin, errRob, h=1)

        Diebold-Mariano Test

data:  errLinerrRob
DM = 1.0514, Forecast horizon = 1, Loss function power = 2, p-value = 0.2965
alternative hypothesis: two.sided

and

> dm.test(errLin, errRob, h=1, power=1)

        Diebold-Mariano Test

data:  errLinerrRob
DM = 1.995, Forecast horizon = 1, Loss function power = 1, p-value = 0.04978
alternative hypothesis: two.sided

My questions are:

1) Which of the test is appropriate? That one with the parameter Power = 1 or Power = 2? In the documentation stands: The power used in the loss function. Usually 1 or 2. Or are they even appropriate?

2) What does the p-value exactly mean under the assumption that the null hypothesis is that the two methods have the same forecast accuracy?

Thanks for your responses.


Okay. I have made some progress in this topic.

The test I have done in my first posting rejects the null hypothesis that the accuracy of model 1 and model 2 have the same levels of accuracy.

To test if the accuracy of model 2 is better than the accuracy of model 1, we have to do this:

> dm.test(squaredErrorsFromForecastModelOne , squaredErrorsFromForecastModelTwo, alternative=c("greater"), h=1, power=1)

        Diebold-Mariano Test

data:  squaredErrorsFromForecastModelOnesquaredErrorsFromForecastModelTwo
DM = 1.995, Forecast horizon = 1, Loss function power = 1, p-value = 0.02489
alternative hypothesis: greater

The p-value is lower than the 5 % significance level so that we can reject the null hypothesis that the accuracy of model 1 is better than model 2.

However, these question remain:

1) As I am using RMSE (Root Mean Square Error) as my accuracy measure, is it correct that I use as input the squared errors from the forecast models?

2) What parameter shall I use in the dm.test for the power used in the loss function? 1 or 2?

Thanks.

Best Answer

First, you are supposed to supply raw forecast errors to the Diebold-Mariano test function dm.test. However, you are supplying squared forecast errors (in the text part above the separating line).

Second, the choice of power is entirely due to the loss function, as you noted. It is only you who knows your loss function. Suppose you lose $x$ dollars if the forecast error is $x$. Then your loss function is linear and you should use the option power=1. On the other hand, your pain may be growing quadratically such that you lose $x^2$ dollars when the forecast error is $x$. Then you should use power=2. If you are unsure about your own loss function, you may ask another question at this site giving the context of your application. But since at one point you say that you are using RMSE as the forecast accuracy measure, it may be sensible to use power=2 to be consistent.

Third, $p$-value tells us how likely you are to observe a difference in the losses (due to the forecast errors) that is at least as large as the one currently observed if the losses (due to the forecast errors) were actually equal in population. Sorry for such a long sentence.

Finally, I would not be comfortable with an approach like I want to underpin statistically that model 2 has a better accuracy. Shouldn't you care about finding out the truth as much as the available data and the statistical methods can help you? If model 1 was better than model 2 in reality, wouldn't you want to learn that? It may be tempting to abuse statistics to obtain a result you are wishing for, but... But perhaps I am misinterpreting you.