I developed two prediction models using non-linear regression analysis to predict a set of values using sigmoidal and power functions. I was wondering how I can evaluate the accuracy of these individual models (Sigmoid predictions vs Power predictions) besides RMSE? Also, are there any other hypothesis tests such as Chi-Square Goodness of Fit Test that I can use to evaluate the prediction accuracy of these models?
Solved – accuracy of a regression prediction model
accuracypredictionregression
Related Solutions
First, you are supposed to supply raw forecast errors to the Diebold-Mariano test function dm.test
. However, you are supplying squared forecast errors (in the text part above the separating line).
Second, the choice of power is entirely due to the loss function, as you noted. It is only you who knows your loss function. Suppose you lose $x$ dollars if the forecast error is $x$. Then your loss function is linear and you should use the option power=1
. On the other hand, your pain may be growing quadratically such that you lose $x^2$ dollars when the forecast error is $x$. Then you should use power=2
. If you are unsure about your own loss function, you may ask another question at this site giving the context of your application. But since at one point you say that you are using RMSE as the forecast accuracy measure, it may be sensible to use power=2
to be consistent.
Third, $p$-value tells us how likely you are to observe a difference in the losses (due to the forecast errors) that is at least as large as the one currently observed if the losses (due to the forecast errors) were actually equal in population. Sorry for such a long sentence.
Finally, I would not be comfortable with an approach like I want to underpin statistically that model 2 has a better accuracy. Shouldn't you care about finding out the truth as much as the available data and the statistical methods can help you? If model 1 was better than model 2 in reality, wouldn't you want to learn that? It may be tempting to abuse statistics to obtain a result you are wishing for, but... But perhaps I am misinterpreting you.
First of all you are in the territory of problems some very smart people have developed on before, which makes it almost impossible to invent something better than the rest of the community without an intense study of the existing literature. Trust me I have tried.
In order to get good estimates of confidence you need to go into ensemble models. If you want to have a conference on a opinion ask many experts. here there are two main families, which both give very good scores.
Random Forest - take the average of multiple deeply trained decision trees. Typically the first benchmark machine learning model.
Gradient Boosted Decision Trees - Train new decision trees on the residual of the previous. Thereby putting higher emphasis on previously badly modeled observations. Very often the best performing model on the data science competition site kaggle.com.
So my advice would be use Random Forest or a gradient booster and trust the scores. The gradient booster is properly the closes model to your idea.
Best Answer
There are many measures of point prediction accuracy, like the mae, the rmse, or the mape. You may want to browse through our questions with these tags. Why use a certain measure of forecast error (e.g. MAD) as opposed to another (e.g. MSE)? is likely helpful, as may be What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? and Mean absolute error OR root mean squared error?
This page and discusses some of these KPIs in the context of time series forecasting.