For releases prior to MATLAB R2017b, here is a detailed explanation of how this is implemented within MATLAB:
1. Computation of the reference line:
Suppose random variable X has a lognormal distribution. Then by definition, log(X) has a normal distribution. As a result, there exist constants A and B > 0 such that log(Y) = A + B*log(X) has a normal distribution with mean mu = 0 and standard deviation sigma = 1. Equivalently, Y has a lognormal distribution with parameters mu = 0 and sigma = 1. Since B > 0,
Prob( X <= exp(t) ) = Prob( log(X) <= t ) = Prob( (log(Y) - A)/B <= t ) = Prob( log(Y) <= A + B*t ) = Prob( Y <= exp(A + B*t) ) = FY( exp(A + B*t) )
Where FY( . ) is the CDF of a lognormal distribution with parameters mu = 0 and sigma = 1.
Suppose we are given N observations of X, say x_1, x_2, ..., x_N. To estimate A and B, probplot uses the 25 and 75 percentiles of values x_i. Let q25 = 0.25 and q75 = 0.75. Let exp(t25) and exp(t75) be such that:
Prob( X <= exp(t25) ) = q25
Prob( X <= exp(t75) ) = q75
exp(t25) can be approximated by quantile([x_1,...,x_N],q25). Similar logic holds for approximating exp(t75). Therefore, we can approximate t25 and t75 like this:
where invFY( . ) is the inverse CDF of a lognormal distribution with parameters mu = 0 and sigma = 1. Taking natural logs:
log(invFY(q25)) = A + B*t25
log(invFY(q75)) = A + B*t75
These two equations can now be used to solve for A and B. Notice that B > 0 as required.
2. Lines created in probplot and the y-axis labeling:
Suppose x_(1), x_(2),...,x_(N) are the sorted observations in increasing order. I assume no ties in the x_(i) to keep things simple. For very small delta, Prob( X <= x_(i) - delta ) ~= (i-1)/N and Prob( X <= x_(i) + delta ) ~= i/N. Hence, a sensible estimate of
Prob( X <= x_(i) ) ~= average of (i-1)/N and i/N = (i - 0.5)/N (the midpoint rule).
If x_i's do really come from a lognormal distribution then [using t = log(x_(i)) in the first equation]:
(i - 0.5)/N ~= Prob( X <= x_(i) ) = Prob( X <= exp(log(x_(i))) ) = FY( exp(A + B*log(x_(i))) )
or
log( invFY( (i - 0.5)/N ) ) ~= A + B*log(x_(i))
We would expect a plot of log(x_(i)) vs. log( invFY( (i - 0.5)/N ) ) to be a straight line. We plot two lines:
(a) A plot of log(x_(i)) vs. A + B*log(x_(i)) [the expected behavior if x_i's do really come from a lognormal distribution]
(b) A plot of log(x_(i)) vs. log( invFY( (i - 0.5)/N ) ) [the observed behavior]
Then we visually compare (a) and (b). We could plot (a) for a range of x values not necessarily restricted to x_(i). If you put a breakpoint in the probplot subfunction setprobticks, you can see that probplot changes thelabels on the y-axis to be on the probability scale. The datatips are converted into probabilities in the probplotDatatipCallback listener.
If y is a value on the y-axis, then for some p, y = log( invFY( p ) ). Inverting FY(exp(y)) = p. The tick label for y is thestring p.
Kstest(x)returns a test decision for the null hypothesis that the data in vector x comes from a specified standard normal distribution or it does not come from such a distribution
If the result is one the test rejects the null hypothesis at default significance level(5%)
From your data
Data=ObservedData(:,1);
x=(Data-3.2)/0.44; %specified mean and standard deviation
r=kstest(x) ;
Similarly ,for a bimodal distribution, check the kstest for each distribution seperately
Best Answer