MATLAB: How are the fitted line and the y-axis (probabilities) levels within probplot computed

Statistics and Machine Learning Toolbox

Exactly how are the fitted line and the reference levels on the y axis computed for probplot?

I could not find that piece of information in the documentation.

Best Answer

This information has been added to the documentation as of MATLAB R2017b, as seen in the "algorithms" section of the "probplot" page.

https://www.mathworks.com/help/releases/R2017b/stats/probplot.html#d119e599292

For releases prior to MATLAB R2017b, here is a detailed explanation of how this is implemented within MATLAB:

1. Computation of the reference line:

Suppose random variable X has a lognormal distribution. Then by definition, log(X) has a normal distribution. As a result, there exist constants A and B > 0 such that log(Y) = A + B*log(X) has a normal distribution with mean mu = 0 and standard deviation sigma = 1. Equivalently, Y has a lognormal distribution with parameters mu = 0 and sigma = 1. Since B > 0,

Prob( X <= exp(t) ) = Prob( log(X) <= t ) = Prob( (log(Y) - A)/B <= t ) = Prob( log(Y) <= A + B*t ) = Prob( Y <= exp(A + B*t) ) = FY( exp(A + B*t) )

Where FY( . ) is the CDF of a lognormal distribution with parameters mu = 0 and sigma = 1.

Suppose we are given N observations of X, say x_1, x_2, ..., x_N. To estimate A and B, probplot uses the 25 and 75 percentiles of values x_i. Let q25 = 0.25 and q75 = 0.75. Let exp(t25) and exp(t75) be such that:

Prob( X <= exp(t25) ) = q25

Prob( X <= exp(t75) ) = q75

exp(t25) can be approximated by quantile([x_1,...,x_N],q25). Similar logic holds for approximating exp(t75). Therefore, we can approximate t25 and t75 like this:

t25 ~= log(quantile([x_1,...,x_N],q25)).

t75 ~= log(quantile([x_1,...,x_N],q75)).

Now,

q25 = Prob( X <= exp(t25) ) = FY( exp(A + B*t25) )

q75 = Prob( X <= exp(t75) ) = FY( exp(A + B*t75) )

Equivalently,

invFY(q25) = exp(A + B*t25)

invFY(q75) = exp(A + B*t75)

where invFY( . ) is the inverse CDF of a lognormal distribution with parameters mu = 0 and sigma = 1. Taking natural logs:

log(invFY(q25)) = A + B*t25

log(invFY(q75)) = A + B*t75

These two equations can now be used to solve for A and B. Notice that B > 0 as required.

2. Lines created in probplot and the y-axis labeling:

Suppose x_(1), x_(2),...,x_(N) are the sorted observations in increasing order. I assume no ties in the x_(i) to keep things simple. For very small delta, Prob( X <= x_(i) - delta ) ~= (i-1)/N and Prob( X <= x_(i) + delta ) ~= i/N. Hence, a sensible estimate of

Prob( X <= x_(i) ) ~= average of (i-1)/N and i/N = (i - 0.5)/N (the midpoint rule).

If x_i's do really come from a lognormal distribution then [using t = log(x_(i)) in the first equation]:

(i - 0.5)/N ~= Prob( X <= x_(i) ) = Prob( X <= exp(log(x_(i))) ) = FY( exp(A + B*log(x_(i))) )

log( invFY( (i - 0.5)/N ) ) ~= A + B*log(x_(i))

We would expect a plot of log(x_(i)) vs. log( invFY( (i - 0.5)/N ) ) to be a straight line. We plot two lines:

(a) A plot of log(x_(i)) vs. A + B*log(x_(i)) [the expected behavior if x_i's do really come from a lognormal distribution]

(b) A plot of log(x_(i)) vs. log( invFY( (i - 0.5)/N ) ) [the observed behavior]

Then we visually compare (a) and (b). We could plot (a) for a range of x values not necessarily restricted to x_(i). If you put a breakpoint in the probplot subfunction setprobticks, you can see that probplot changes the labels on the y-axis to be on the probability scale. The datatips are converted into probabilities in the probplotDatatipCallback listener.

If y is a value on the y-axis, then for some p, y = log( invFY( p ) ). Inverting FY(exp(y)) = p. The tick label for y is the string p.

Related Solutions

MATLAB: KStest for unimodal and bimodal models

Hi,

Kstest(x) returns a test decision for the null hypothesis that the data in vector x comes from a specified standard normal distribution or it does not come from such a distribution

If the result is one the test rejects the null hypothesis at default significance level(5%)

From your data

Data=ObservedData(:,1); 
x=(Data-3.2)/0.44; %specified mean and standard deviation
r=kstest(x) ;

Similarly ,for a bimodal distribution, check the kstest for each distribution seperately

Refer to the link kstest for more information

MATLAB: Plotting error: Need to plot the evolution of symbolic matrix

Try this:

PHI = @(t) [ 1, t, t/3-(2*exp(-3*t))/9+2/9, (2*t)/3+(2*exp(-3*t))/9-2/9;
    0, 1, (5*exp(-3*t))/12-(3*exp(t))/4+1/3, 2/3-exp(t)/4-(5*exp(-3*t))/12;
    0, 0, exp(-3*t)/4+(3*exp(t))/4, exp(t)/4-exp(-3*t)/4;
    0, 0, (3*exp(t))/4-(3*exp(-3*t))/4, (3*exp(-3*t))/4+exp(t)/4];
N = @(t) norm(PHI(t));
t = linspace(0, 1, 50);
for k = 1:numel(t)
    Nt(k) = N(t(k));
end
figure
plot(t, Nt)
grid

Best Answer

Related Solutions

MATLAB: KStest for unimodal and bimodal models

MATLAB: Plotting error: Need to plot the evolution of symbolic matrix

Related Question