MATLAB: Do I receive a different mean of the inverse CDF for a generalized pareto distribution using GPINV as I theoretically would expect in Statistics Toolbox 7.1 (R2009a)

generalizedinversemeanparetoStatistics and Machine Learning Toolbox

I am using some random data to generate the inverse cdf for a generalized pareto distribution with tail index (shape) parameter k, scale parameter sigma and threshold (location) parameter theta such as:
p = rand(10000,1);
sigma = 0.2531;
k = 0.9982;
theta = 0;
pp = gpinv(p,k,sigma,theta);
Comparing the theoretical mean "theta + sigma/(1-k)" with the calculated mean I get a result which differs a lot from the expected.
ppmean = mean(pp)
expmean = theta + sigma/(1-k)
ppmean is about 1.98 and expmean is about 140.

Best Answer

This is to be expected. Try the same experiment with p length of 10^6.
p = rand(10^6,1);
sigma = 0.2531;
k = 0.9982;
theta = 0;
pp = gpinv(p,k,sigma,theta);
You'll find that mean(pp) creeps up a bit. If you try 10^7, it would creep up a little more. Now try
hist(p2,100)
You'll see that the distribution is VERY skewed. The problem is that the distribution is so badly skewed that the sample mean, even in fairly large samples, is also badly skewed, with a long right tail. So almost all of the time, you're seeing the a sample mean that is much smaller than the theoretical mean. Getting an observed p value of, say (by making it up), 10^12 is rare, but when it happens it gives a very large sample mean, and compensates on average for all those sample means that are small.
This is a case where the central limit theorem requires an impractically large sample size to have any relevance. The fact that the variance of the distribution
[m,v] = gpstat(0.9982,0.2531,0)
resulting in
m =
140.61
v =
Inf
overflows is another indication that you're in trouble. So either using different values for k, sigma and theta or increasing the numbers of values in p will be the best approach here and will result in a better sample mean compared to the theoretical mean.