hi all, i tabulated out my data residual and i do a boxplot on the residual, what i find is the median line is closer to the 75th percentile so by right it should be positively skewed and i use matlab function skewness(data) and got a positive value which validate the residual boxplot as positively skewed. However, if its positively skewed it should mean there are more observation on the positive side, but of all the residual that i have(124 of them) only 30+ of them is positive. I wondering if it still considered as positively skewed ? or is there something i missed out ? (and yes, my mean is smaller than median)
MATLAB: How matlab determine skewness
skewness
Related Solutions
Hi Matteo,
I don't believe there is really a problem here. Let Vt denote the transpose of the data matrix Values, so that Vt is 4x17 like you want. With
[coef score latent] = pca(Vt)
pca computes** the eigenvalue decomposition of the covariance matrix of Vt. The resulting eigenvectors are the columns of coef. Then score is computed with
score = Vt0*coef
where Vt0 is shown in the code below. Each eigenvector is real and normalized to 1, but is still arbitrary to within an overall factor of +-1. Since all the columns of coef are orthogonal to each other, there is no foolproof way to assign those signs uniquely. It looks like the example you are using disagrees with Matlab on the overall sign of the first column of coef. So the score matrix comes up with different signs as well. Nothing wrong with that.
What matters is that you can still relate the scores to the data. No matter what the overall signs of the columns of coef are, after you calculate scores that way it should still be true that
Vt0 = score*coef'
You can change the overall signs of coef columns and make your own coef, as in the example below. The resulting scores agree with the example.
Forget about salad. After looking at the png file, this all makes me want to fly to Belfast and eat fish and chips.
load('Example.mat')Vt = Values';[coef score lat] = pca(Vt);% create new coef matrix and a new score matrix
coefnew = coef;coefnew(:,1) = -coefnew(:,1);Vt0 = Values' - mean(Values'); % covariance matrix calculation does this anyway
scorenew = Vt0*coefnew;figure(1);scatter(score(:,1), score(:,2))figure(2);scatter(scorenew(:,1), scorenew(:,2)) % same as example
Vt0_check = score*coef'Vt0_check_new = scorenew*coefnew'max(max(abs(Vt0-Vt0_check)))max(max(abs(Vt0-Vt0_check_new)))
** it accomplishes this more accurately using svd instead of eig but with the same intent
It depends on what you want to do. In a symmetric distribution, the mean and median will be close if not equal. The mean is affected by extreme values, while the median is not. If you have any doubts as to the ‘best’ parameter, I would simply choose the median.
To illustrate:
x1 = [1 2 3 10];x2 = [1 2 3 99];x1_stats = [mean(x1) median(x1)]x2_stats = [mean(x2) median(x2)]x1_stats = 4 2.5x2_stats = 26.25 2.5
Best Answer