Hello Gabri,
the answer in short is: "the difference is due to numerical errors".
For example in the normal case
rng('default')
X=randn(1000,5);
sv=svd(X);
svchk=sort(eig((X'*X)^.5),'descend');
disp([sv svchk])
the output of the two methods virtually coincides
32.805793753218964 32.805793753218971
32.596688238949213 32.596688238949206
31.830948559876663 31.830948559876632
30.587981509539315 30.587981509539343
29.375388175424334 29.375388175424348
In the case below with two eigenvalues equal to 0 (similar to your case)
because columns 4 and 5 are linear combinations of the first 3 columns, here is what happens
rng('default')
rng(10)
n=1000;
X13=1000*randn(n,3);
X4=X13(:,1)*100+X13(:,3)*200;
X5=X13(:,2)+11200*X13(:,3);
X=[X13 X4 X5];
sv=svd(X);
svchk=sort(eig((X'*X)^.5),'descend');
disp([sv svchk])
1.0e+08 *
3.649002137098263 3.649002137098261
0.031607898246415 0.031607898246415
0.000320576287875 0.000320576287875
0.000000000000000 0.000000000023097
0.000000000000000 0.000000000002413
The singular values not equal to 0 are virtually the same using the two methods.
The singular values close to 0 computed with svd (as it happens in your case) can be slightly different from those computed with eig.
In any case the singluar values computed with svd seem to be more reliable.
One final remark: it is always better to start standardizing the data in order to avoid too small or too large numbers and potential numerical errors.
Hope it helps
Marco
Best Answer