Solved – Cramer Von Mises – How to use this test correctly

goodness of fitmathematical-statisticspareto-distributionr

I had a problem when I tried to test the fitting of my data with the generalized Pareto distribution. I used the MLE to estimate the two parameters 'shape' and 'scale' and I generated a vector of random variables GPD with them. Does this make sense if I test the goodness of fit (with Cramer von Mises criterion – cvm.test – but it still doesn't work) between my data and the vector generated?

I found some lines at the end of page 5 in this pdf saying that the test may not true (I'm not sure that I correctly understood). But if I don't fit my data with the values generated, so what should be fitted?

Best Answer

A Cramer von Mises test is for a fully specified distribution, not one where you fitted parameters.

When you fit the parameters, the test statistic is nearly always smaller than the one for a prespecified set of parameters. The fitted model will be too close, and your significance level will be far smaller than you intend (and consequently power will also be low).

You can deal with it if you adjust the test for the fitting*, but it's no longer distribution free.

*(e.g. by simulating the distribution of the test statistic under estimation and using that simulated null distribution rather than the tabulated distribution)

Related Solutions

Goodness-of-Fit – Anderson-Darling Test vs Cramér-von Mises Criterion Explained

There can be no single state-of-the-art for goodness of fit (for example no UMP test across general alternatives will exist, and really nothing even comes close -- even highly regarded omnibus tests have terrible power in some situations).

In general when selecting a test statistic you choose the kinds of deviation that it's most important to detect and use a test statistic that is good at that job. Some tests do very well at a wide variety of interesting alternatives, making them decent default choices, but that doesn't make them "state of the art".

The Anderson Darling is still very popular, and with good reason. The Cramer-von Mises test is much less used these days (to my surprise because it's usually better than the Kolmogorov-Smirnov, but simpler than the Anderson-Darling -- and often has better power than it on differences "in the middle" of the distribution)

All of these tests suffer from bias against some kinds of alternatives, and it's easy to find cases where the Anderson-Darling does much worse (terribly, really) than the other tests. (As I suggest, it's more 'horses for courses' than one test to rule them all). There's often little consideration given to this issue (what's best at picking up the deviations that matter the most to me?), unfortunately.

You may find some value in some of these posts:

Is Shapiro–Wilk the best normality test? Why might it be better than other tests like Anderson-Darling?

2 Sample Kolmogorov-Smirnov vs. Anderson-Darling vs Cramer-von-Mises (about two-sample tests but many of the statements carry over

Motivation for Kolmogorov distance between distributions (more theoretical discussion but there are several important points about practical implications)

I don't think you'll be able to form a confidence interval for the cdf in the Cramer-von Mises and Anderson Darline statistics, because the criteria are based on all of the deviations rather than just the largest.

Solved – Determining shape parameter for Generalized Pareto Distribution Scipy

In Mathematica this works:

GPD = ParetoPickandsDistribution[2, 3, .07];
data = RandomVariate[GPD, 10^4];
FindDistributionParameters[data, ParetoPickandsDistribution[mu, sigma, eta]] ->
{mu -> 2.00036, sigma -> 2.96883, eta -> 0.07022}

where mu is the location parameter, sigma the scale parameter, and eta the shape parameter.

FindDistributionParameters can use 5 different methods (see the documentation), but I believe the default is maximum likelihood estimation (MLE). Mathematica has all the tools (Likelihood, LogLikelihood, FindMaximium, Maximize, and ParetoPickandsDistribution for the PDF) to do MLE from scratch, if that's your wont. There is a good explanation of MLE in Wikipedia.

Best Answer

Related Solutions

Goodness-of-Fit – Anderson-Darling Test vs Cramér-von Mises Criterion Explained

Solved – Determining shape parameter for Generalized Pareto Distribution Scipy

Related Question